Accessing Files with Python

See also

Reading assignment

Text: Chapter 7

What is a file, anyway?

You have been using files for as long as you have used computers. Files are those things you pass around when you share photos, or (gasp!) music. Files hold email messages, and even programs you ask your machine to run. Files contain simple text, or binary data only a program could love. Our goal in this lecture is to figure out how to get your program to work with files.

This is an important step in your programming experience. Without files you would need to enter all the data from the keyboard. Many (most) programs want to consume far more data than a human can be expected to type in. And, we often want to capture the output of our programs so we can return to the computer at a later time and work on those data again. Yet again, you do this all the time with tools like Microsoft Word, or even gVim!

Kinds of files

At the lowest level a file is just a bunch of bytes written on some form of storage media. That media is usually our hard disk, in which case, the individual bits are magnetized particles of an oxide coating on the disk surface. We do not need to know all about this, since it is the operating system’s job to keep track of all that hardware stuff.

As mentioned above, there are two basic kinds of files.

Text files

There are files you can look at with any editor, like gVim. Those files contain simple text. Actually, they contain a sequence of bytes with codes representing characters in the ASCII character set. ASCII is an American standard code that all keyboards generate. The key codes tell the computer what key you pressed on the keyboard when you entered your text. The same codes tell the system what character to display if we send the code to the console screen. Text files are handy, but they are not the only kind of file around.

Binary files

A binary file is just a series of bits written out by some program. Any program that wants to use a binary file must know exactly what the bits mean. Since data in our computer can take anywhere from one byte (8 bits) to around 64 bytes, the program needs to know exactly how many bits to pull from the file. Binary files are created by programs and read by other programs. Humans seldom deal with such files directly.

Ways to access files

There is one more interesting feature of files, they can be read either sequentially, as though they were written on a tape (ever hear of those things?), or they can be read using something called random access. If a file is designed to be read randomly, we can jump around and pull contents from wherever we like. This kind of access is hard to manage, so we will not do any of that in this course!

File names and locations

All modern operating systems try to make you think about where files are stored on your system using a filing analogy. We think of folders, sub-folders, and files as though they were physical things stored in filing cabinets. We have even come up with a naming convention to describe all this.

On Windows, the name on the outside of the filing cabinet is a letter, like “C”. We indicate the top level of folders using something like this notation: C:\foldername. If that folder contains other folders, we indicate the names of those additional folders like so: c:\foldername\subfoldername. When we finally get to the point where we are actually talking about a file, we add on that last part: c:\foldername\subfoldername\filename.ext. Everything up to the final file name is called the path to the file. That will be important later.

That last part, the .ext is what Windows uses to indicate the program to use when trying to process this file. We can use other programs to process the file if we like, as long as they understand how the file is structured.

If we leave off the path part and just refer to the file name. We must be working in some folder. The path may not be specified, but it can be determined by the operating system. More on that later.

Text files have a special organization. They are made up of lines of text. A line of text is just a series of characters with a special end-of-line marker at the end. Lines can be very long, or empty, in which case all they hold is that end-of-line marker. Since we will be working mostly with text files, let’s see how we can work with them.

Managing files

We can read files, write to them, create them, and even destroy them. Our Python code can do all of this. To get started, let’s build a new file from scratch. In this example, we assume we are working in some folder (where you create your code).

Opening a file

We can access a file by asking the operating system to connect our program to one. Here is an example of a chunk of code that does this:

fout = open("mydata.txt","w")

The two parameters specify the name of the file, and the mode we will place the file into when we open it. In the example, the “w” says open for writing. Other possibilities are “r” which means open for reading, and “a” which means open for append. Appending adds text to the end of anything currently in the file.

Warning

Opening a file for writing will delete any existing file with the specified name and create a new, empty one in its place. Also, opening a file for reading implies that the file is already on your system. If not, Python will complain!

In this example, we did not specify the full path to the file, so Python assumes we want the file to live in the folder we are working in (where the program code lives).

Notice, also, that we stored the result of that open function in a variable. As you might suspect by now, this is another of those magical object things, one called a file object.

Writing into a file

Once the file has been created and is ready for writing, we use the write method to add text to the file:

fout.write("This is a text line")
fout.write("This is more text")

There is a problem with this example. The two lines do not work like we might expect. Unlike writing to the standard output device using the print function, the write method adds exactly the text you indicate to the file. The two statements above would be run together on one line, probably not what you wanted. The solution is to place end-of-line markers (\n) in your strings where you want lines to end.

If you want to put a bunch of text into the file, you can use the triple-quotes string as follows:

fout.write("""
This is a bunch of text
written on several lines
and no special line-breaks were needed!
""")

Closing the file

When you are working the file, you close it and let the operating system break the connection with the file. In this example, we were writing to the file, so the close function completes that work and makes sure all the output is stored in the file. We can then open up that file in another program, and see what is there!

fout.close()

Appending to a file

Appending works just like writing, except that if the file exists, it will be opened and positioned so the first thing you write gets added after the last character in the current file. (If the file does not exist, a new one gets created.) You use the “a” mode when you open for appending.

Opening a file for reading

To open a file for reading, you again use the open command, identifying the file by name, but this time we use the “r” mode:

fin = open("mydata2.txt","r")

Reading the lines of text from the file

If we know exactly how many lines of text we want to read, we can do this:

line1 = fin.readline()
line2 = fin.readline()
print(line1)
print(line2)

Both of these new variables will hold a string with the text we processed.

Notice that when we print the lines using the print function, the lines are double spaced. That is happening because the string actually has the end-of-line marker, and the print function adds a second end-of-line marker.

We can get rid of this extra marker by using the special strip method available for string variables (OK, objects):

line1.rstrip('\n')

The rstrip function strips from the right end of the string. By default, it removes all whitespace from the string on the right side. This includes any spaces, tabs, or newline markers. If we only want to remove the end-of-line marker, we add the \n character as a parameter. There is an lstrip method as well, this one defaults to stripping off any leading spaces. Finally, there is a strip method that removes all whitespace around the string (leading and trailing spaces and newlines).

Most of the time, we do not really know how many lines of text we might find in a file. In some cases, Python will complain if we try to read past the end of file, which is a special marker used by the operating system to indicate the real end of the file. However, the readline method will just return an empty string if it tries to read past the end marker. This empty string is indicated by two single quotes back to back (with no space in between!).

We can read the file using the logic we looked at last time. We start off reading a single line, then enter the loop where we check for the empty string, process the line, then read another line:

line = fin.readline()
while line != '':
    line.rstrip('\n')
    print line
    line = fin.readline()

Converting text to other data types

As we saw earlier, if what we are reading is actually integers or floating point numbers, we usually write them out one per line and read them in the same way. In this case, we can convert the string into the desired kind of data:

line - fin.readline()
while line != '':
    val = float(line)
    print(format(val, '.2f'))

That last line uses the format function to show the floating point number with exactly two places after the decimal point (like for money!)

Using a for loop to read lines

We can read all the lines in a file using a for loop as well. This one is interesting:

for line in fin:
    val = float(line)
    print(format(val, '.2f'))

Wait a minute! How does Python know when to stop. Magic!

Actually, this special form of loop uses the end-of-file marker to halt the loop. Notice that we did not need the initial input statement to make this work. This is very handy!

File access exceptions

If you try to process files and some kind of error occurs, an exception will be raised. Exceptions are signals that Python uses to indicate a problem of some sort. If we do nothing about them, the program will halt. However, it is possible to catch the exception signal and continue processing. The exceptions raised when processing files should be obvious:

  • Attempting to open a file for reading (or appending) that does not exist

  • Attempting to open a file where you have no permissions to do so

  • Attempt to create a file in a directory that does not exist

  • Other errors such as network problems

If you suspect that you may run into any of these, you can wrap up your code in a try-except statement and handle the problem as you see fit.

try:
    fin = open("mydata2.txt","r")
    for line in fin:
        print line.rstrip()
except IOError:
    print "File not found - aborting!"

print "Done!"

Change the name of the file and run this to see the result:

C:\_data\ACCdata\ACC.ITSE1359\code\Files>test4
File not found - aborting!
Done!