File Handling in Python

So far in this course, we have been using variables to store data. The problem with this approach is that as soon as program ends our data is lost. One way to store the data permanently is to put it in a file. This chapter discusses how we can store data in the file as well as read data from the file.

In Python, File Handling consists of following three steps:

  1. Open the file.
  2. Process file i.e perform read or write operation.
  3. Close the file.

Types of file #

There are two types of files:

  1. Text Files
  2. Binary Files

A file whose contents can be viewed using a text editor is called a text file. A text file is simply a sequence of ASCII or Unicode characters. Python programs, HTML source code are some of the example of text files.

A binary file stores the data in the same as as stored in the memory. mp3 files, Image files, Word Document are some of the examples of binary files. You can't read a binary file using a text editor.

In this lesson we will discuss how to work with both types of files.

Let's start.

Opening a File #

Before you perform any operation on a file, you must the open it. Python provides a function called fopen() to open a file. It's syntax is:

fileobject = open(filename,  mode)

filename is the name or path of the file.

mode is a string which specifies the type operation you want to perform on the file (i.e read, write, append, etc).The following table lists different modes available to you.

Mode Description
"r" Opens the file for reading.
"w" Opens the file for writing. In this mode, if file specified doesn't exists, it will be

created and if the file exists, then it's data is data is destroyed.
"a" | Opens the file in append mode. If the file doesn't exists this mode will create the file. If the file already exists then it appends new data to the end of the file rather than destroying data as "w" mode does.

We can also specify the type of file (i.e text file or binary file.) we want to work with in mode string by appending 't' for text files and 'b' for binary files. But since text mode is default mode, it is generally omitted while opening files in text mode.

On success, open() returns a file object which is associated with the filename specified while calling the open() function.

Here are some examples:

Example 1:

f = open("employees.txt", "rt")

This statement opens the text file `employees.txt` for reading. Since text mode is default, the above statement can also be written as:

f = open("employees.txt", "r")  # same as f = open("employees.txt", "rt")

Example 2:

f = open("teams.txt", "w")

This statement opens the text file in write mode.

Example 3:

f = open("teams.dat", "wb")

This statement opens the binary file in write mode.

Example 4:

f = open("teams.dat", "ab")

This statement opens the binary file in append mode.

Instead of using relative file paths we can also use absolute file paths. For example:

f = open("/home/tom/documents/README.md", "w")

This statements opens the text file README.md that is in /home/tom/documents/ directory in write mode.

In Windows, remember to escape backslashes while using absolute path names, Otherwise, you will get an error. For example:

f = open("C:\\Users\\tom\\documents\\README.md", "w")

We can also use something called "raw string" by specifying r character in front of the string as follows:

f = open(r"C:\Users\tom\documents\README.md", "w")

The r character causes the Python to treat every character in string as literal characters.

Closing the File #

Once we are done working with the file or we want to open the file in some other mode, we should close the file using close() method of the file object as follows:

f.close()

Closing a file releases valuable system resources. In case you forgot to close the file, Python will automatically close the file when program ends or file object is no longer referenced in the program. However, if your program is large and you are reading or writing multiple files that can take significant amount of resource on the system. If you keep opening new files carelessly, you could run out of resources. So be a good programmer and close the file as soon as you are done with it.

TextIOWrapper class #

The file object returned by open() function is an object of type _io.TextIOWrapper. The class _io.TextIOWrapper provides methods and attributes which helps us to read or write data to and from the file. The following table lists some common methods and attributes of _io.TextIOWrapper class.

Method Description
read([num]) Reads the specified number of characters from the file and returns them as string. If num is omitted then it reads the entire file.
readline() Reads a single line and returns it as a string.
readlines() Reads the content of a file line by line and returns them as a list of strings.
write(str) Writes the string argument to the file and returns the number of characters written to the file.
seek(offset, origin) Moves the file pointer to the given offset from the origin.
tell() Returns the current position of file pointer.
close() Close the file

Writing Data to the Text File #

The following program demonstrates how write data to the the file:

python101/Chapter-18/writing_to_file.py

f = open("readme.md", "w")

f.write("First Line\n")
f.write("Second Line\n")
f.write("Third Line\n")

f.close()

In line 1, we are using open() method to open the text file in write mode. If the readme.md file doesn't exists, the open() method will create the file. If the file already exists, then it's data will be overwritten. Run the program and then open readme.md file. It should look like this:

python101/Chapter-18/readme.md

First Line
Second Line
Third Line

Let's take a close look at how write method writes data to the file.

All read and write operations in a file begins at file position pointer. What is file position pointer ? A file position pointer is simply a marker which keeps track of the number of bytes read or written in a file. This pointer automatically moves when after every read or write operation.

When a file is opened file position pointer points at the beginning of the file. The write() function begins writing at the current file position and then increments the file position pointer. For example, the following figure shows the position of file position pointer after each write operation.

movement-of-file-pointer.png

Note that unlike print() function, write() method do not print newline character(\n) at the end of string automatically. We can also use print() function to write data to the file. Let's take a closer look at the signature of the print() using the help() function.

>>>
>>> help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

>>>

Notice the fourth parameter in the function signature i.e file. By default, file points to the standard output means it will print data to the screen. To output data to a file just specify the file object. The following program uses print() function instead of write() to write data to the file.

python101/Chapter-18/writing_data_using_print_function.py

f = open("readme.md", "w")

print("First Line", file=f)
print("Second Line", file=f)
print("Third Line", file=f)

f.close()

This program produces the same output as before, the only difference is that in this case the newline character is automatically added by the print() function.

Reading Data from a Text file #

To read a file you must open it in "r" mode. In addition to that, you should ensure that the file you want to read already exists because in "r" mode open() function throws an error if its unable to find a file.

To test whether a file exists or not, we can use isfile() function from the os.path module. For example:

>>>
>>> import os
>>>
>>> os.path.isfile("D:\python101\readme.md")  # file exists
True
>>>
>>> os.path.isfile("D:\python101\index.html")  # file doesn't exists
False
>>>
>>>
```python

The following programs demonstrates how to read a file using `read()`, `readline()` and `readlines()` function.


**Example 1:** Reading data at once using read() method

**python101/Chapter-18/read_method_demo.py**

```python
f = open("readme.md", "r")

print(f.read())  # read all content at once

f.close()

Output:

First Line
Second Line
Third Line

Example 2: Reading data in chunks using read() method.

python101/Chapter-18/reading_in_chunks.py

f = open("readme.md", "r")

print("First chunk:", f.read(4), end="\n\n")  # read the first 4 character
print("Second chunk:", f.read(10), end="\n\n")  # read the next 10 character
print("Third chunk:", f.read(), end="\n\n")  # read the remaining characters in the file

f.close()

Output:

First chunk: Firs

Second chunk: t Line
Sec

Third chunk: ond Line
Third Line

When the file is open in read mode the file pointer points at the beginning of the file.

file_pointer_at_the_beginning_1.png

After reading first 4 characters, file pointer is at t.

file_pointer_at_the_beginning_2.png

After reading the next 10 characters, position position is at character o.

file_pointer_at_the_beginning_3.png

The third call to read() reads the remaining characters in the file and returns them as a string. At this point, the file position pointer points at the end of the file. Consequently, any subsequent calls to read() method returns an empty string.

file_pointer_at_the_beginning_4.png

Example 3: Using readline() to read data from a file.

f = open("readme.md", "r")

# read first line
print("Ist line:", f.readline())  

# read the fist two characters in the second line
print("The first two characters in the 2nd line:", f.read(2), end="\n\n")

# read the remaining characters int the second line
print("Remaining characters in the 2nd line:", f.readline())

# read the next line
print("3rd line:", f.readline())  

# end of the file reached, so readline returns an empty string ""
print("After end of file :", f.readline())  

f.close()

Output:

Ist line: First Line

The first two characters in the 2nd line: Se

Remaining characters in the 2nd line: cond Line

3rd line: Third Line

After end of file :

As usual when file is open file position pointer points at the beginning of the file.

file_pointer_at_the_beginning_1-JXMXFC.png

The first call to readline() method moves the position pointer to the start of next line.

file-pointer-after-first-readline-call.png

The read() function then reads two characters from the file which moves the position pointer at character c.

file-pointer-after-reading-2-characters-using-read-method.png

The readline() is called again but this time it starts reading from the character c to the end of the line (including the newline character).

file-pointer-after-second-call-to-readline.png

In line 13, readline() is called again, to read the last line. At this point, file position pointer is at the end of the file. That's why readline() in line 16 returns an empty string.

file_pointer_position_after_third_call_to_readline_method.png

Example 4: Using readlines() to read data from a file

readlines_method_demo.py

f = open("readme.md", "r")

# read all the line as return and them as a list of strings
print(f.readlines())   

f.close()

Output:

['First Line\n', 'Second Line\n', 'Third Line\n']

Reading Large Files #

The read() and readlines() methods work great with small files. But what if your file has thousands or million of lines in it ? In such cases using read() or readlines() may result in memory hogs. A better approch would be to use use loops and read file data in small chunks. For example:

python101/Chapter-18/reading_large_file_demo1.py

f = open("readme.md", "r")

chunk = 10  # specify chunk size
data = ""

# keep looping until there is data in the file
while True:
    data = f.read(chunk)
    print(data, end="")

    # if end of file is reached, break out of the while loop
    if data == "":
        break


f.close()

Output:

First Line
Second Line
Third Line

Here we are using infinite loop to iterate over the contents of the file. As soon as end of file is reached, the read() method returns an empty string i.e "", if condition in line 12, evaluates to true and break statement causes the loop to terminate.

It turns out that Python also allows us to use for loop though the file data using file object as follows:

python101/Chapter-18/reading_large_files_using_for_loop.py

f = open("readme.md", "r")

for line in f:
    print(line, end="")

f.close()

Output:

First Line
Second Line
Third Line

Appending Data to the Text File #

We can use "a" mode to append data to end of the file. The following program demonstrates how to append data to the end of the file.

python101/Chapter-18/append_data.py

f = open("readme.md", "a")

print("Appending data to the end of the file ...")
f.write("Fourth Line\n")
f.write("Fifth Line\n")

f.close()

## open the file again

print("\nOpening the file again to read the data ...\n")

f = open("readme.md", "r")

for line in f:
    print(line, end="")

f.close()

Output:

Appending data to the end of the file ...

Opening the file again to read the data ...

First Line
Second Line
Third Line
Fourth Line
Fifth Line

Working with files using with statement #

Python also provides a nice shortcut for file handling using with statement. The following is general form of with statement when used with files.

with open(filename, mode) as file_object:
    # body of with statement
    # perform the file operations here

The best thing about this shortcut is that it automatically closes the file without requiring any work on your part. The statements inside the body of the with statement must be equally indented otherwise you will get an error. The scope of variable file_object is only limited to the body of the with statement. If you try to call read() or write() method on it outside the block you will get an error.

The following examples shows how we can use with statement to read and write data to and from the file.

Example 1: Reading data line by line using for loop

python101/Chapter-18/with_statement.py

with open("readme.md", "r") as f:
    for line in f:
        print(line, end="")

Output:

First Line
Second Line
Third Line
Fourth Line
Fifth Line

Example 2: Reading all data at once using read() method.

python101/Chapter-18/with_statement2.py

with open("readme.md", "r") as f:
    print(f.read())

Output:

First Line
Second Line
Third Line
Fourth Line
Fifth Line

Example 4: Reading large file in small chunks

with open("readme.md", "r") as f:
    chunk = 10  # specify chunk size
    data = ""

    # keep looping until there is data in the file
    while True:
        data = f.read(chunk)
        print(data, end="")

        # if end of file is reached, break out of the while loop
        if data == "":
            break

Output:

First Line
Second Line
Third Line
Fourth Line
Fifth Line

Example 4: Writing data to a file using the write() method

with open("random.txt", "w") as f:
    f.write("ONE D\n")
    f.write("TWO D\n")
    f.write("THREE D\n")
    f.write("FOUR D\n")

Reading and Writing Binary Data #

The following program copies binary data from a source file (image.jpg) to a target file (dest.jpg).

f_source = open("source.jpg", "rb")
f_dest = open("dest.jpg", "wb")

char_count = 0

for line in f_source:
    char_count += len(line)
    f_dest.write(line)

print(char_count, "characters copied successfully")

f_source.close()
f_dest.close()

Output:

561276 characters copied successfully

Run the program and it should create dest.jpg file in same directory as source.jpg.

Note: Number of characters copied may vary.