Accessing Files with Python¶
See also
Reading assignment
Text: Chapter 7
What is a file, anyway?¶
You have been using files for as long as you have used computers. Files are those things you pass around when you share photos, or (gasp!) music. Files hold email messages, and even programs you ask your machine to run. Files contain simple text, or binary data only a program could love. Our goal in this lecture is to figure out how to get your program to work with files.
This is an important step in your programming experience. Without files you would need to enter all the data from the keyboard. Many (most) programs want to consume far more data than a human can be expected to type in. And, we often want to capture the output of our programs so we can return to the computer at a later time and work on those data again. Yet again, you do this all the time with tools like Microsoft Word, or even gVim!
Kinds of files¶
At the lowest level a file is just a bunch of bytes written on some form of storage media. That media is usually our hard disk, in which case, the individual bits are magnetized particles of an oxide coating on the disk surface. We do not need to know all about this, since it is the operating system’s job to keep track of all that hardware stuff.
As mentioned above, there are two basic kinds of files.
Text files¶
There are files you can look at with any editor, like gVim. Those files contain simple text. Actually, they contain a sequence of bytes with codes representing characters in the ASCII character set. ASCII is an American standard code that all keyboards generate. The key codes tell the computer what key you pressed on the keyboard when you entered your text. The same codes tell the system what character to display if we send the code to the console screen. Text files are handy, but they are not the only kind of file around.
Binary files¶
A binary
file is just a series of bits written out by some program. Any
program that wants to use a binary file
must know exactly what the bits
mean. Since data in our computer can take anywhere from one byte (8 bits) to
around 64 bytes, the program needs to know exactly how many bits to pull from
the file. Binary files are created by programs and read by other programs.
Humans seldom deal with such files directly.
Ways to access files¶
There is one more interesting feature of files, they can be read either sequentially, as though they were written on a tape (ever hear of those things?), or they can be read using something called random access. If a file is designed to be read randomly, we can jump around and pull contents from wherever we like. This kind of access is hard to manage, so we will not do any of that in this course!
File names and locations¶
All modern operating systems try to make you think about where files are stored on your system using a filing analogy. We think of folders, sub-folders, and files as though they were physical things stored in filing cabinets. We have even come up with a naming convention to describe all this.
On Windows, the name on the outside of the filing cabinet is a letter, like
“C”. We indicate the top level of folders using something like this notation:
C:\foldername
. If that folder contains other folders, we indicate the names
of those additional folders like so: c:\foldername\subfoldername
. When we
finally get to the point where we are actually talking about a file, we add on
that last part: c:\foldername\subfoldername\filename.ext
. Everything up to
the final file name is called the path
to the file. That will be important
later.
That last part, the .ext
is what Windows uses to indicate the program to
use when trying to process this file. We can use other programs to process the
file if we like, as long as they understand how the file is structured.
If we leave off the path
part and just refer to the file name. We must be
working in some folder. The path may not be specified, but it can be
determined by the operating system. More on that later.
Text files have a special organization. They are made up of lines
of text. A
line of text is just a series of characters with a special end-of-line marker
at the end. Lines can be very long, or empty, in which case all they hold is
that end-of-line marker. Since we will be working mostly with text files, let’s
see how we can work with them.
Managing files¶
We can read files, write to them, create them, and even destroy them. Our Python code can do all of this. To get started, let’s build a new file from scratch. In this example, we assume we are working in some folder (where you create your code).
Opening a file¶
We can access a file by asking the operating system to connect our program to one. Here is an example of a chunk of code that does this:
fout = open("mydata.txt","w")
The two parameters specify the name of the file, and the mode
we will place the file into when
we open it. In the example, the “w” says open for writing
. Other
possibilities are “r” which means open for reading
, and “a” which means
open for append
. Appending adds text to the end of anything currently in
the file.
Warning
Opening a file for writing will delete any existing file with the specified name and create a new, empty one in its place. Also, opening a file for reading implies that the file is already on your system. If not, Python will complain!
In this example, we did not specify the full path
to the file, so Python
assumes we want the file to live in the folder we are working in (where the
program code lives).
Notice, also, that we stored the result of that open
function in a
variable. As you might suspect by now, this is another of those magical
object
things, one called a file object
.
Writing into a file¶
Once the file has been created and is ready for writing, we use the write
method to add text to the file:
fout.write("This is a text line")
fout.write("This is more text")
There is a problem with this example. The two lines do not work like we might
expect. Unlike writing to the standard output device using the print
function, the write
method adds exactly the text you indicate to the file.
The two statements above would be run together on one line, probably not what
you wanted. The solution is to place end-of-line markers (\n
) in your
strings where you want lines to end.
If you want to put a bunch of text into the file, you can use the triple-quotes string as follows:
fout.write("""
This is a bunch of text
written on several lines
and no special line-breaks were needed!
""")
Closing the file¶
When you are working the file, you close
it and let the operating system
break the connection with the file. In this example, we were writing to the
file, so the close
function completes that work and makes sure all the
output is stored in the file. We can then open up that file in another program,
and see what is there!
fout.close()
Appending to a file¶
Appending works just like writing, except that if the file exists, it will be opened and positioned so the first thing you write gets added after the last character in the current file. (If the file does not exist, a new one gets created.) You use the “a” mode when you open for appending.
Opening a file for reading¶
To open a file for reading, you again use the open
command, identifying the
file by name, but this time we use the “r” mode:
fin = open("mydata2.txt","r")
Reading the lines of text from the file¶
If we know exactly how many lines of text we want to read, we can do this:
line1 = fin.readline()
line2 = fin.readline()
print(line1)
print(line2)
Both of these new variables will hold a string with the text we processed.
Notice that when we print the lines using the print
function, the lines are
double spaced. That is happening because the string actually has the end-of-line
marker, and the print
function adds a second end-of-line marker.
We can get rid of this extra marker by using the special strip
method
available for string variables (OK, objects):
line1.rstrip('\n')
The rstrip
function strips from the right end of the string. By default, it
removes all whitespace
from the string on the right side. This includes any
spaces, tabs, or newline markers. If we only want to remove the end-of-line
marker, we add the \n
character as a parameter. There is an lstrip
method as well, this one defaults to stripping off any leading spaces. Finally,
there is a strip
method that removes all whitespace
around the string
(leading and trailing spaces and newlines).
Most of the time, we do not really know how many lines of text we might find in
a file. In some cases, Python will complain if we try to read past the end of
file
, which is a special marker used by the operating system to indicate the
real end of the file. However, the readline
method will just return an
empty string if it tries to read past the end marker. This empty string is
indicated by two single quotes back to back (with no space in between!).
We can read the file using the logic we looked at last time. We start off reading a single line, then enter the loop where we check for the empty string, process the line, then read another line:
line = fin.readline()
while line != '':
line.rstrip('\n')
print line
line = fin.readline()
Converting text to other data types¶
As we saw earlier, if what we are reading is actually integers or floating point numbers, we usually write them out one per line and read them in the same way. In this case, we can convert the string into the desired kind of data:
line - fin.readline()
while line != '':
val = float(line)
print(format(val, '.2f'))
That last line uses the format
function to show the floating point number
with exactly two places after the decimal point (like for money!)
Using a for loop to read lines¶
We can read all the lines in a file using a for loop as well. This one is interesting:
for line in fin:
val = float(line)
print(format(val, '.2f'))
Wait a minute! How does Python know when to stop. Magic!
Actually, this special form of loop uses the end-of-file marker to halt the loop. Notice that we did not need the initial input statement to make this work. This is very handy!
File access exceptions¶
If you try to process files and some kind of error occurs, an exception
will be raised. Exceptions are signals that Python uses to indicate a problem of
some sort. If we do nothing about them, the program will halt. However, it is
possible to catch
the exception signal and continue processing. The
exceptions raised when processing files should be obvious:
Attempting to open a file for reading (or appending) that does not exist
Attempting to open a file where you have no permissions to do so
Attempt to create a file in a directory that does not exist
Other errors such as network problems
If you suspect that you may run into any of these, you can wrap up your code in
a try-except
statement and handle the problem as you see fit.
try:
fin = open("mydata2.txt","r")
for line in fin:
print line.rstrip()
except IOError:
print "File not found - aborting!"
print "Done!"
Change the name of the file and run this to see the result:
C:\_data\ACCdata\ACC.ITSE1359\code\Files>test4
File not found - aborting!
Done!