String Basics in C

The programs we have written so far deals only with numbers or characters, but a real world program should be able to store and manipulate text when needed. Unfortunately, C offers no separate data type for strings, Languages like Java and C# provides a separate type for strings but this is not the case with C. In C strings are stored as an array of characters terminated by a null character. An array of characters is a string only if it's the last element is a null character ('\0'). The null character is an escape sequence just like \n (newline), \t (tab) with ASCII value of 0. For example:

char name[10] = {'s', 't', 'r', 'i', 'n', 'g' ,'\0'};

So we can say that a string is just a one-dimensional array of characters with a null character ('\0') as it's the last element.

String literal #

A string literal is just a sequence of characters enclosed in double quotes (""). It is also known as a string constant. Here are some examples of string literals:

"I am learning C"
"My Lucky Number is 1"
"Hello World!"
""

The double quotes ("") are not part of a string literal they are just used to delineate (i.e mark boundaries) a string. Whenever you create a string literal in a program the compiler automatically adds null character('\0') at the end.

How string literals are stored ? #

As discussed, a string is actually an array of characters terminated by a null character ('\0'). Whenever compiler sees string literal of length n it allocates n + 1 consecutive bytes of memory for the string. This memory will contain all the characters inside a string, plus null character ('\0') at the end of the string. So the string literal "Hello World" will be stored in the memory as:

memory-representation-of-a-string

As you can see string literal "Hello World" is stored as an array of 12 characters(including '\0').

A string literal can also be empty.

"" (empty string literal, contains only '\0'). It will be stored in the memory as an array of 1 character.

null-character-in-memory

String literal as a Pointer #

String literals are stored just like arrays. The most important point to understand is that a string literal is a pointer to the first character of the array. In other words "Hello World" is a pointer to the character 'H'. Since "Hello World" points to the address of character 'H', it's base type is a pointer to char or (char *). It means that if we have a pointer variable of type pointer to char or (char*) we can assign the string literal to it as:

char *str = "Hello World";

After this assignment str points to the address of the first element, using pointer arithmetic we can access any character inside a string literal.

printf("%c" ,*(str+0) ); // prints H
printf("%c" ,*(str+4) ); // prints o

Even though you can access an individual element of a string literal. Attempting to modify a string literal is an undefined behavior and may cause the program to crash.

*str = 'Y'; // wrong

Since "Hello World" is a pointer we can apply pointer arithmetic directly to it. For example:

"Hello World" + 0 points to the address of character 'H'.
"Hello World" + 1 points to the address of character 'e'.
"Hello World" + 2 points to the address of character 'l'.

and so on.

To get the value at address "Hello World" + 1 just dereference the expression.

*("Hello World" + 1) gives 'e'
*("Hello World" + 2) gives 'l'

and so on

In chapter one dimensional array we have discussed that:

int arr[] = {16,31,39,59,11};

Then writing arr[i] is same as writing *(arr+i).

Therefore *("Hello World" + 1) can also be written as "Hello World"[1].

printf() and scanf() revisited #

If you look at the prototype of scanf() and print(), you will find that both function expects a value of type (char*) as their first argument.

int printf (const char*, ...);
int scanf (const char*, ...);

Note: For now ignore the keyword const. It is discussed in detail in the upcoming chapters.

Well now you know when you call printf() function as:

printf("Hello World");

You are actually passing an address of "Hello World" i.e a pointer to the first letter of the array which is 'H'.

String literal v/s character literal #

Beginners often confuse between "a" and 'a' , the former is a string literal where "a" is a pointer to the memory location which contains the character 'a' followed by a null character ('\0'). On the other hand character literal, 'a' represents the ASCII value of the character 'a' which is 97. Therefore you must never use character literal where a string literal is required or vice versa.

Multiline strings literals #

You are not limited to single line strings. If your string is big enough to accommodate in one line then you can expand it by adding a backslash at the end of the line. For example:

printf("This is first line \
some characters in the second line \
even more characters in the third line \n");

Using Escape Sequences #

You can use escape sequences like \n (newline), \t (tab) in a string literal. For example:

printf("Lorem ipsum \ndolor sit \namet, consectetur \nadipisicing elit \nsed do eiusmod");

Expected Output:

dolor sit
amet, consectetur
adipisicing elit
sed do eiusmod

String literal followed by a string literal #

When two string literals are placed adjacent to each other, then the compiler the concatenates them and appends null character ('\0') at the end of concatenated string.

print("Hello"" World"); // prints Hello World

is same as writing:

print("Hello World");

String Variables #

Since a string is an array of characters we must declare an array of sufficient size to store all characters including the null character ('\0').

char ch_arr[6];

Here ch_arr can only hold 6 characters including the null character ('\0'). If you are initializing elements of an array at the time of declaration then you can omit the size.

char ch_arr[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd'};

C also provides a much cleaner and easier to type syntax for initializing strings. For example, the above statement can also be written as:

char ch_arr[] = "Hello World";

We have studied that string literal is a pointer the first character of the array, but there is an exception to this rule: when a string literal is used to initialize an array of characters as in the above statement, then it doesn't represent any address. That means we can't use pointer arithmetic with "Hello World". The all characters of the array ch_arr will be stored in the memory as:

memory-representation-of-a-string-1

What if the number of characters(including '\0') to be stored is less than the size of the array. In that case, the compiler adds extra null characters ('\0'). For example:

char name[10] = "john";

The array name will be stored in the memory as:

extra-null-characters

If the number of characters (including '\0') to be stored is greater than the size of the array, then compiler shows a warning message: excess elements in the array initializer.

Generally, the best way to create strings is to omit the size of the array, in which case the compiler computes it based on the number of characters present in the initializer. For example:

char str[] = "this is the best way";

It is important to note that omitting the size doesn't mean that length of array str can be increased or decreased sometimes later in the program (to resize length of an array use malloc() or calloc()) . Once the program is compiled the size of the str is fixed to 21 bytes. Since counting characters in a long string is an error-prone process, this method is also preferred when the string is too long.

Let's conclude this chapter by creating two simple programs.

Example 1:

The following program prints characters of a string and address of each character.

#include<stdio.h>

int main()
{
    int i;
    char str[5] = "hello";

    for(i = 0; str[i] != '\0'; i++)
    {
        printf("Character = %c\t Address = %u\n", str[i], &str[i]);
    }

    // signal to operating system program ran fine
    return 0;
}

Expected Output:

Character = h Address = 2686752
Character = e Address = 2686753
Character = l Address = 2686754
Character = l Address = 2686755
Character = o Address = 2686756

Note: Address may differ every time you run the program

The important thing to note in the program is the terminating condition in the for loop which says: keep looping until the null character is encountered.

Example 2:

The following program prints characters in the string and address of the characters using a pointer.

#include<stdio.h>

int main()
{
    int i;
    char str[6] = "hello";
    char *p;

    for(p = str; *p != '\0'; p++)
    {
        printf("Character = %c\t Address = %u\n", *(p), p);
    }

    // signal to operating system program ran fine
    return 0;
}

Expect Output:

Character = h Address = 2686752
Character = e Address = 2686753
Character = l Address = 2686754
Character = l Address = 2686755
Character = o Address = 2686756

Note: Address may differ every time you run the program

How it works ?

Here we have assigned array name str (which is a pointer to char or (char*) ) to pointer variable p. After this statement both p and str points to the same the same array. Now we can use pointer arithmetic to move back and forth to access elements in the array. Each iteration of for loop increment the value of p by 1. The for loop stops when p points to the address of null character ('\0').