Python Strings¶
Read time: 24 minutes (6036 words)
Remember those simple looking strings
we started using in our first
exposure to Python (in Hello, World
). Well, there is more to them than
meets the eye!
Strings are objects¶
Well, that is dumb! Everything in Python is an object, and we will learn how to build our own objects (simple ones) after our break. For now, we just want to explore the tools we get with Python that help us manipulate strings.
Strings are a sequence of characters
. OK, what is a character
(don’t
answer that!).
Formally, characters
we use normally come from the ASCII table of symbols.
This table was designed by Americans, for use by Americans, but the table went
global when the Internet became popular. Unfortunately, it is not up to the
task of displaying symbols used by other countries, so it is (slowly) being
replaced by another encoding called Unicode
. For our course, we will stick
with ASCII.
Here is the ASCII table:

Looking at this table, we see the code used for each character (written as a hexadecimal number that can be stored in a single byte). For instance, the code for the capital letter “Q” is 51 (in hexadecimal). We sometimes write this as “0x51” in programming.
Here is a string we can play with¶
my_string = "roie r black"
We already know we can print this out. What else can we do with it?
>>> print(my_string)
roie r black
Finding the length of the string:
>>> len(my_string)
12
This result shows that the string has 12 characters in it. Sometimes, the number will not look right, and that may be because the character is unprintable. If you look at the top few rows in the ASCII table you will see several special characters that have meanings in certain situations, but which have no “visual” representation (like “CR”, which stands for “carriage return” and “LF”, which stands for “line feed”).
Since a string has a length, you might suspect that we can reach into the string and extract individual characters, and you are right:
>>> my_string[5]
'r'
Remember that index numbers start at zero and reach up to len(my_string)-1.
We can use this fact to set up loops that pass over each character in the string. Rather than make you set up a special variable to count, Python lets you set up a simpler loop:
>>> for ch in my_string:
... if ch == 'r':
... print("found an 'r'")
...
found an 'r'
found an 'r'
Looks about right! Using this scheme, we do not worry about setting up indexes that might get out of range, Python takes care of that mess for us!
>>> my_string[15]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
Doing math with strings¶
We have already seen this:
>>> test = 'test'
>>> string = 'string'
>>> print(test + ' ' + string')
test string
Here is another example:
>>> print(test*2)
testtest
>>> print((test+' ')*2)
test test
WHat really got printed on that last example? Here is a way to find out:
>>> print('"' + (test + ' ')*2 + '"')
"test test "
Do you see where that extra space came from?
test += 'junk'
print(test)
testjunk
We need to be a bit careful, because stings are immutable. So how did we add something to the existing string. Simple! Python threw the old one away and created a new one. Try this:
>>> test[2] = 'x'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Well, that is annoying. How can I change the characters in the string? We will see that later
String methods¶
Since strings are those funny “object” gizmos we have talked about, they have a bunch of internal “methods” they know about. Here is a selection of those that are available:
isalnum() - is the string just made up of letters and numbers
isaplha() - is the string only made up of letters
isdigit() - is the string made up of only digits
islower() - is the string only lower case letters
isupper() - guess!
These can be useful when error checking input you got from those annoying human operators who like to see if they can break your program.
String conversion methods¶
There are a few methods we can use in error checking, to make our life a bit easier:
>>> test1 = 'RoIe'
>>> test2 = "Roie"
>>> test1 = test1.lower()
>>> test2 = test2.lower()
>>> print(test1 == test2)
True
Note
Notice what we are doing here. We are not trying to reach into an existing string and modify it directly. Instead, we use the old string and modify it (actualy, a copy of it), then store the new result back in the same variable container. The old string is gone, and replaced by the new one.
Here I am converting the strings to all lower case to get rid of the capitalization issues. My goal is to see if they user entered ‘roie’ in any format. Now I can see how that might be done!
We have see other useful string methods:
strip() - remove any “whitespace” characters from the sides of the string
lstrip() - remove whitespace from the left side only
rstrip() - remove whitespace from the right side only
There are other variations of these methods, but I seldom find a good use for them.
String searching¶
Often, we need to check strings to see if they contain substrings
. Python
has several tools for this:
endswith(pattern) - returns True if the string ends with the text specified
startswith(pattern) - returns True is the string starts with the pattern
find(pattern) - returns the index of the first occurrence of this pattern in the string (or -1 if not found)
replace(pattern1, pattern2) - replace each occurrence of pattern1 with pattern2
Remember that all of these return new strings if they modify the old string, so you need to save (or use) the result.
Breaking up strings¶
Here is a typical problem, we need to take a string form of a date and break it up into its parts:
>>> date = "December 7, 1941"
>>> parts = date.split()
>>> print parts
['December', '7,', '1941']
This is interesting. The split
method, with no parameter, breaks up a
string at space boundaries. Notice that we ended up with the comma in the day
number chunk. Can we get rid of that?
>>> parts[1] = parts[1][:-1]
>>> print(parts)
['December', '7', '1941']
Remember that a string is just a special kind of list
, this one a list
of characters. We can use the subscripting notation to access parts of the
string. Remember the -1 index refers to the end of the string. We need to study
this notation a bit further:
>>> test = "thisisalongstring"
>>> test[0:4]
'this'
>>> test[11:]
'string'
>>> test[11:-1]
'strin'
>>> test[11:-2]
'stri'
As usual, the notation is like the range
where the first index is the
number (starting with zero) of the character we want to start with. The last
number is one more than the ending index.
If we leave off the first index, it is assumed to start at zero, if we leave off the last index, it ends up being the length of the string, which is one more than the last index.
Formatting with strings¶
Python 3 adds a few new tricks we can do with strings to format our output.
We have already seem the old python way of using a string with special placeholders for data:
data = "This is an int: %d, and a float: %f" % (3,3.14)
This is a simple string substitution scheme that Python 3 want to eliminate. In its place, here is the new scheme:
>>> data = "This is an int: {0}, and a float: {1}".format(1,1.5)
>>> print(data)
This is an int: 1, and a float: 1.5
Python is smart enough to figure out how to display each thing, integer or float.
Extending this, we have a lot of power to specify exactly how we want the numbers to be displayed:
>>> mem = 624.253
>>> units = 'GB'
>>> data = "You have this much memory: {0:.1f} {1}".format(mem,units)
>>> print(data)
You have this much memory: 624.3 GB
There is much more to this formatting stuff. We will look at more of this in a later lecture.