String Basics in C

The programs we have written so far deals only with numbers or characters, but a real world program should be able to store and manipulate text when needed. Unfortunately, C offers no separate data type for strings, Languages like Java and C# provides a separate type for strings but this is not the case with C. In C strings are stored as an array of characters terminated by a null character. An array of characters is a string only if it’s the last element is a null character ('\0'). The null character is an escape sequence just like \n (newline), \t (tab) with ASCII value of 0. For example:

So we can say that a string is just a one-dimensional array of characters with a null character (‘\0’) as it’s the last element.

String literal

A string literal is just a sequence of characters enclosed in double quotes (""). It is also known as a **string constant**. Here are some examples of string literals:

The double quotes ("") are not part of a string literal they are just used to delineate (i.e mark boundaries) a string. Whenever you create a string literal in a program the compiler automatically adds null character('\0') at the end.

How string literals are stored ?

As discussed, a string is actually an array of characters terminated by a null character ('\0'). Whenever compiler sees string literal of length n it allocates n + 1 consecutive bytes of memory for the string. This memory will contain all the characters inside a string, plus null character ('\0') at the end of the string. So the string literal "Hello World" will be stored in the memory as:

memory-representation-of-a-string

As you can see string literal "Hello World" is stored as an array of 12 characters(including '\0').

A string literal can also be empty.

The "" (empty string literal, contains only '\0'). It will be stored in the memory as an array of 1 character.

null-character-in-memory

String literal as a Pointer

String literals are stored just like arrays. The most important point to understand is that a string literal is a pointer to the first character of the array. In other words "Hello World" is a pointer to the character 'H'. Since "Hello World" points to the address of character 'H', it’s base type is a pointer to char or (char *). It means that if we have a pointer variable of type pointer to char or (char*) we can assign the string literal to it as:

After this assignment str points to the address of the first element, using pointer arithmetic we can access any character inside a string literal.

Even though you can access an individual element of a string literal. Attempting to modify a string literal is an undefined behaviour and may cause the program to crash.

Since "Hello World" is a pointer we can apply pointer arithmetic directly to it. For example:

"Hello World" + 0 points to the address of character 'H'.
"Hello World" + 1 points to the address of character 'e'.
"Hello World" + 2 points to the address of character 'l'.

and so on.

To get the value at address "Hello World" + 1 just dereference the expression.

*("Hello World" + 1) gives 'e'
*("Hello World" + 2) gives 'l'

and so on

In chapter one dimensional array we have discussed that:

Then writing arr[i] is same as writing *(arr+i).

Therefore *("Hello World" + 1) can also be written as "Hello World"[1].

printf() and scanf() revisited

If you look at the prototype of scanf() and print(), you will find that both function expects a value of type (char*) as their first argument.

Note: For now ignore the keyword const. It is discussed in detail in the upcoming chapters.

Well now you know when you call printf() function as:

You are actually passing an address of "Hello World" i.e a pointer to the first letter of the array which is 'H'.

String literal v/s character literal

Beginners often confuse between "a" and 'a' , the former is a string literal where "a" is a pointer to the memory location which contains the character 'a' followed by a null character ('\0'). On the other hand character literal, 'a' represents the ASCII value of the character 'a' which is 97. Therefore you must never use character literal where a string literal is required or vice versa.

Multiline strings literals

You are not limited to single line strings. If your string is big enough to accommodate in one line then you can expand it by adding a backslash at the end of the line. For example:

Using Escape Sequences

You can use escape sequences like \n (newline), \t (tab) in a string literal. For example:

Expected Output:

String literal followed by a string literal

When two string literals are placed adjacent to each other, then the compiler the concatenates them and appends null character (‘\0’) at the end of concatenated string.

is same as writing:

String Variables

Since a string is an array of characters we must declare an array of sufficient size to store all characters including the null character ('\0').

Here ch_arr can only hold 6 characters including the null character ('\0'). If you are initializing elements of an array at the time of declaration then you can omit the size.

C also provides a much cleaner and easier to type syntax for initializing strings. For example, the above statement can also be written as:

We have studied that string literal is a pointer the first character of the array, but there is an exception to this rule: when a string literal is used to initialize an array of characters as in the above statement, then it doesn’t represent any address. That means we can’t use pointer arithmetic with "Hello World". The all characters of the array ch_arr will be stored in the memory as:

memory-representation-of-a-string

What if the number of characters(including '\0') to be stored is less than the size of the array. In that case, the compiler adds extra null characters ('\0'). For example:

The array name will be stored in the memory as:

extra-null-characters

If the number of characters (including '\0') to be stored is greater than the size of the array, then compiler shows a warning message: excess elements in the array initializer.

Generally, the best way to create strings is to omit the size of the array, in which case the compiler computes it based on the number of characters present in the initializer. For example:

It is important to note that omitting the size doesn’t mean that length of array str can be increased or decreased sometimes later in the program (to resize length of an array use malloc() or calloc()) . Once the program is compiled the size of the str is fixed to 21 bytes. Since counting characters in a long string is an error-prone process, this method is also preferred when the string is too long.

Let’s conclude this chapter by creating two simple programs.

Example 1:

The following program prints characters of a string and address of each character.

Expected Output:

Note: Address may differ every time you run the program

The important thing to note in the program is the terminating condition in the for loop which says: keep looping until the null character is encountered.

Example 2:

The following program prints characters in the string and address of the characters using a pointer.

Expect Output:

Note: Address may differ every time you run the program

How it works:

Here we have assigned array name str (which is a pointer to char or (char*) ) to pointer variable p. After this statement both p and str points to the same the same array. Now we can use pointer arithmetic to move back and forth to access elements in the array. Each iteration of for loop increment the value of p by 1. The for loop stops when p points to the address of null character ('\0').

2 thoughts on “String Basics in C

  1. In example 1 I believe that char str[5] = “hello”; should be str[6] = “hello”; to account for the ‘\0’. Or make it char str[] = “hello”; to account for the null terminated string literal. Else the output seems to print six characters with the last being a question mark in a box. Once again thanks for making these tutorials!

Leave a Comment