Strings in Python

A string a sequence of characters enclosed in single ('') or double quotation ("") marks. Here is how you create strings in python.

Inside the Python Shell or IDLE the string is always displayed using single quotation marks. However, if you use the print() function only contents of the string is displayed.

Some languages like C, C++, Java treats a single character as a special type called char, but in Python a single character is also a string.

Counting Number of Characters Using len() Function

The len() built-in function counts the number of characters in the string.

Creating Empty Strings

Although variables s3 and s4 do not contain any characters they are still valid strings. You can verify this fact by using the type() function.

So should I use single quotes or double quotes while creating strings?

Double quotes comes in handy when you have single quotation marks inside a string. For example:

If we had used the single quotes, we would get the following error:

The problem here is that Python interpreter thinks that the second quotation mark, right after character I, marks the end of the string and doesn’t know what to do with the rest of the characters.

Similarly, If you want to print double quotes inside a string, just wrap the entire string inside single quotes instead of double quotes.

There is yet another way to embed single or double quotation marks inside a string using Escape Sequence which is discussed next.

Escape Sequences

Escape Sequences are set of special characters used to print characters which can’t be typed directly using keyboard. Each Escape Sequence starts with a backslash ( \ ) character.

The following table lists some common Escape Sequences.

Escape Sequence Meaning
\n Newline – Prints a newline character
\t Tab – Prints a tab character
\ Backslash – Prints a backslash ( \ ) character
\' Single quote – Prints a single quote
\" Double quote – Prints a double quote

When escape sequences are used inside the strings, Python treats them as special commands. For example, \t character inside the string prints a tab character (one tab character is the same as printing four spaces). For example:

Similarly \n character inside the string prints a newline character. The newline character isn’t displayed on the screen, instead, it causes the cursor to start printing subsequent characters from the beginning of the next line. For example:

You can also use ( \' ) and ( \" ) escape sequences to print single or double quotation marks in a string. For example:

When we use escape sequences to print single or double quotes it doesn’t matter whether the string is wrapped inside the single quotes or double quotes.

Similarly to print a single backslash character (\) use (\\) escape sequence.

String Concatenation

String concatenation means joining one or more strings together. To concatenate strings in Python we use + operator.

Note that + operator when used with two numbers performs a mathematical addition. However, when used with strings it concatenates them.

What would happen if one of the operand is not a string? For example:

Here wear are trying to concatenate string "Python" and a number 101, but the Python reports the following error:

As Python is a strongly typed language, it’s can’t just convert data of one type to completely different type automatically.

So, what’s the solution?

The solution is to use str() function to convert an integer to string as follows:

String Repetition Operator (*)

Just as with numbers, we can can also use * operator with strings. When used with strings * operator repeats the string n number of times. It’s general format is:

where n is an number of type int.

Note that 5 * "www " and "www " * 5 yields the same result.

The n must be int. Otherwise, you will get an error. For example:

Notice that the error message tells us clearly that a string can’t be multiplied by a non-int type.

Membership Operators – in and not in

The in or not in operators are used to check the existence of a string inside another string. For example:

Accessing Individual Characters in a String

In Python, characters in a string are stored in a sequence. We can access individual characters inside a string by using an index. An index refers to the position of a character inside a string. In Python, strings are 0 indexed, it means that the first character is at the index 0, the second character is at index 1 and so on. The index position of the last character is one less than the length of the string.

string with index position

To access individual characters inside a string we type the name of the variable, followed by the index number of the character inside the square brackets [].

The last valid index for string s1 is 4, if you try to access characters beyond the last valid index you will get IndexError as follows:

Instead of manually counting the index position of the last character in the string, we can use the len() function to calculate the string and then subtract 1 from it to get the index position of the last character.

We can also use negative indexes. A negative index allows us to access characters from the end of the string. Negative index start from -1, so index position of the last character is -1, for the second last character it is -2 and so on.

string with negative index

If negative index is smaller than the last valid index (-8) then IndexError will occur as follows:

Slicing Strings

String slicing allows us to get a slice of characters from the string. To get a slice of string we use slicing operator ( [start_index:end_index] ). It’s syntax is:

str_name[start_index:end_index] returns a slice of string starting from index start_index to the end_index. The character at the end_index will not be included in the slice. Consider the following example:

If end_index is greater than the length of the string then the slice operator returns a slice of string starting from start_index to the end of the string.

The start_index and end_index are optional. If start_index is not specified then slicing begins at the beginning of the string and if end_index is not specified then it goes on to the end of the string. For example:

In the above expression the slicing begins at the beginning of the string, so the above expression is same as s[0:4].

In this case, end_index is omitted as a result, slicing goes on to the end of the string, so s[5:] is same as s[5:len(s)].

Here we have omitted start_index as well as end_index, thus slicing will start from beginning and goes on to the end of the string. In other words, s[:] is same as s[0:len(s)].

We can also use negative index in string slicing.

string negative index

So s[1:-1] will return a slice starting from index 1 to -1, not including the character at index -1.

Everything in Python is an Object

In Python, all data are objects. It means a number, a string, and data of every other type is actually an object. To determine the type of the object we use the type() function.

But What are objects?

Classes and Object – The First Look

Before we learn about objects we must first learn about classes. A class is just a template that defines data and methods. Functions defined inside the class are called methods.

When we define a new class, we essentially create a new data type. To use our new class or data type, we have to create object of that class. Note that defining a class does not occupy any space in memory. Memory is only allocated when we create object based upon that class.

In the light of this newly gained knowledge, let’s see what actually happens when we assign an int value to a variable.

In the above statement, we have assigned a value 100 to the variable num. In object oriented terms, we have just created an object. To learn more about the class or type of the object use the type() method as follows:

<class 'int'> indicates that the num variable is an object of class int. Similarly, every string and float are objects of class str and float respectively.

Built-in classes or types like int, float, str; defines many useful methods. To call these methods we use the following syntax:

Here is an example:

The str class provides methods like upper() and lower() which returns a string after converting it to uppercase and lowercase respectively.

These methods do not alter the value of the original object (s1). This is why the after calling lower() and upper() variable s1 still points to "A String" string object.

To know the object’s memory address we can use id() function as follows:

Note that 15601373811 is the address of 'A String' string object not the address of s1 variable. The object’s memory address will not change during the execution of the program. However, it may change every time you run the program. If both objects are same then will have the same id (or address).

ASCII Characters

In Computers every thing is stored as series of 0s and 1s. Storing numbers is quite easy, just convert them to binary and you are done.

But how characters are stored in the memory?

Computers can’t directly store strings such as 'a', 'b', '1', '$' and so on in the memory. Instead, what they store is the numeric code that represent a character. A mapping of characters and their numeric codes is called ASCII (American Standard Code for Information Interchange) Character set. ASCII characters set has 128 characters.

In addition to the character found in the US keyboard, ASCII set also defines some control characters. Control characters are used to issue commands, they are non-printable characters.

An example of control character is Ctrl+D, which is commonly to terminate the shell window. This character in the ASCII table is represented using EOT (End-of-Transmission) and has an ASCII value of 4.

The following table shows all 128 characters in the ASCII Character set.

ASCII Character Set

Here are a few things to notice:

  • All the uppercase letters from A to Z have ASCII values from 65 to 90.
  • All the lowercase letters from 'a' to 'z' have ASCII values from 97 to 122.
  • When we uses digits (09) inside a string they are represented using ASCII values from 48 to 57.

ord() and chr() function

The ord() function returns the ASCII value of a character and the chr() function returns the character represented by the ASCII value.

ord() function

chr() function

Suppressing newline in print() function

By default, the print() function prints the argument it is given followed by a newline character (\n). For example:



Notice that the string "second line" is printed at beginning of the next line, this is because the newline character (\n) printed by the first print() call causes the output to start from the next line.

We can change this behavior of print() function by passing a special argument named end. Let’s say we want to print $ character at the end of the output instead of newline character (\n). To do so, call print() function as follows:



Notice the '$' character at end of both the string. As the first statement doesn’t print a newline character at the end of the output, the output of the second print() begins on the same line.

If you don’t want to print anything at the end of the output pass end="" to the print() function as follows:



In this case, the first two statement prints an empty string ("") at the end of the output, but the last statement prints "third" followed by a newline character ('\n').

Specifying Separator in print() Function

We have already discussed in lesson Data Types and Variables In Python that when we pass multiple arguments to print() function, they are printed to the console separated by spaces.

To override this behavior we use another special argument called sep (short for separator). Let’s say we want to separate each item by #. To do so, call the print() function as follows:

String Comparison

Just like numbers, we can also compare strings using the relational operators. However, unlike numbers, string comparison is slightly more involved. Strings in Python are compared using the ASCII value of their corresponding characters. The comparison starts off by comparing the first character from both strings. If they differ, the ASCII value of the corresponding characters is compared to determine the outcome of the comparison. On the other hand, if they are equal the next two characters are compared. This process continues until either string is exhausted. If a short string appears at the start of another long string then the short string is a smaller one. In technical jargon, this type of comparison is known as Lexicographical Comparison.

Let’s take some examples:

Example 1:

Here are the steps involved in the evaluation of the above expression:

Step 1: "l" from "link" is compared with "l" from "linq". As they are equal, the next two characters are compared.

Step 2: "i" from "link" is compared with "i" from "linq". Again they are equal, the next two characters are compared.

Step 3: "n" from "link" is compared with "n" from "linq". Again they are equal, the next two characters are compared.

Step 4: "k" from "link" is compared with "q" from "linq". The comparison stops at this step because the corresponding characters are not the same. The ASCII value of k is 107 and that of q is 113, that means "k" is smaller than "q". Therefore the string "linker" is smaller than "linquish". Hence the expression "linker" > "linquish" is false.

Example 2:

"q" from "qwerty" is compared with "a" from "abc". At this point, the comparison stops because corresponding characters are not same. As the ASCII value of "q" is 113 and that of "a" is 97, so "q" is greater than "a". Therefore the string "qwerty" is greater than "abc".

Example 3:

Here short string "ab" appears at the start of another long string "abc". Therefore "ab" is the smaller one.

Some more examples:

String comparison is a common operation in programming. One practical use of string comparison is sorting strings in ascending or descending order.

Strings are Immutable

String objects are immutable. It means that we can’t change the content of a string object after it is created. Consider the following example:

Here we have created a new string object then we are using id() function to know the address of our string object.

Let’s see what will happen if we try to modify an existing string object s by adding " world" to the end of it.

Notice that variable s now points to a completely new address, this is because everytime we modify a string object, we create new string object in the process. This proves the point that string object are immutable.

Expression of the form variable[index] is treated just like a variable. Consequently, they can also appear on the left side of the assignment operator. For example:

Here we are trying to update element at at index 0 by assigning a new string to it. The operation failed because string objects are immutable. If string had been mutable, the above operation would have succeeded.

Formatting String Using the format() Function

Just like numbers, we can also use format() function to format strings. To format string we use type code s and along with a specified width, For example:

string formatting

Unlike numbers, strings are left justified by default. This means that when width is greater than the length of the value, the value is print left justified with trailing spaces instead of leading spaces.

If length of the string is greater than the width specified, then the width is automatically increased to match the length of the string.

To right justify a string a string use > symbol as follows:

right justified string

The statement print(format("Python", "<10s")) is same as print(format("Python", "10s")) because strings are printed left justified by default.

Leave a Comment

%d bloggers like this: