Reading strings from a file¶
Well, the previous lecture implied that we might be able to read in an entire line in one shot with something like:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char *argv[])
{
char * data_file = "test.data";
char * cooked_data = "processed.data";
ifstream inFile;
ofstream outFile;
string item;
int cookedData;;
cout << "Processing file data" << endl;
inFile.open(data_file, ios::in);
if(inFile.is_open()) {
cout << "File was opened" << endl;
outFile.open(cooked_data, ios::out);
if (outFile.is_open()) {
cout << "ready for cooked data" << endl;
inFile >> item;
while(! inFile.eof()) {
cout << "data = " << item << endl;
cookedData = item.length();
outFile << cookedData << endl;
inFile >> item;
}
outFile.close();
} else
cout << cooked_data << " could not be opened" << endl;
inFile.close();
} else
cout << data_file << " could not be opened." << endl;
}
The problem is that the input works like it did for numbers, one “chunk” at a time, not the entire line. Here is a sample data file:
This is a test
of the emergency action system
And here is the output we get:
Processing file data
File was opened
ready for cooked data
data = This
data = is
data = a
data = test
data = of
data = the
data = emergency
data = action
data = system
Clearly, this will help in your lab project, but what if we really do want to read the entire line in.
Doing this involves using a different method:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char *argv[])
{
char * data_file = "test.data";
char * cooked_data = "processed.data";
ifstream inFile;
ofstream outFile;
string item;
int cookedData;
string stuff;
cout << "Processing file data" << endl;
inFile.open(data_file, ios::in);
if(inFile.is_open()) {
cout << "File was opened" << endl;
outFile.open(cooked_data, ios::out);
if (outFile.is_open()) {
cout << "ready for cooked data" << endl;
getline(inFile, item);
while(! inFile.eof()) {
cout << "data = " << item << endl;
cookedData = item.length();
outFile << cookedData << endl;
getline(inFile,item);
}
outFile.close();
} else
cout << cooked_data << " could not be opened" << endl;
inFile.close();
} else
cout << data_file << " could not be opened." << endl;
}
And running this gives us this output:
Processing file data
File was opened
ready for cooked data
data = This is a test
data = of the emergency action system
Now we have a full line of text and processing this one is more difficult. (Guess which approach I would recommend that you use for your lab project!)
Writing small test programs like this is a very good way to test code snippets and make sure your logic is working correctly, before adding something to your project.
Let’s take the first version of this code and do something we probably should have been doing all along. When we write a new function, we really ought to bolt that new function into a test fixture and feed it a bunch of data to make sure we are happy with the result. Since we are studying files, I will show how to build up a testing routine that can exercise a function we have constructed and want to add to our project.
To make this more interesting, we will build the test fixture code in a separate file and have the linker hook in our test function.
Testing a new function¶
Well, to do this, we need a function to test. I will keep this moderately simple!
Suppose we are processing a bunch of text, maybe something like my course note files. My files are a set of lines, each of which may have one or more acronym in it somewhere. The acronym will be marked off by surrounding it with vertical bars. The test line might look like this:
I am currently working at |ACC| as a Professor of |CS|.
My function is going to help me expand the acronyms I have marked into the set of words associated with each one. In this example, what I really want to have in my notes is this:
I am currently working at Austin Community College as a Professor of Computer Science.
All I have done here is expand the acronym using the text I really want.
I need some test data for this exercise, so I Googled “computer acronyms” and came up with this set of terms and phrases that could be substituted whenever the marked up term is found.This is a sample extract from that search:
ACC Austin Community College
ACL Access Control List
ADC Analog-to-Digital Converter
ADF Automatic Document Feeder
ADSL Asymmetric Digital Subscriber Line
AGP Accelerated Graphics Port
AIFF Audio Interchange File Format
AIX Advanced Interactive Executive
ALU Arithmetic Logic Unit
ANSI American National Standards Institute
API Application Program Interface
APU Accelerated Processing Unit
ARP Address Resolution Protocol
ASCII American Standard Code for Information Interchange
ASP Active Server Page
ATA Advanced Technology Attachment
ATM Asynchronous Transfer Mode
Bash Bourne-Again Shell
BASIC Beginner's All-purpose Symbolic Instruction Code
Bcc Blind Carbon Copy
BIOS Basic Input/Output System
Blob Binary Large Object
BMP Bitmap
BSOD Blue Screen of Death
CAD Computer-Aided Design
Cc Carbon Copy
CCD Charged Coupled Device
CD Compact Disc
CD-R Compact Disc Recordable
CD-ROM Compact Disc Read-Only Memory
CD-RW Compact Disc Re-Writable
CDFS Compact Disc File System
CDMA Code Division Multiple Access
CS Computer Science
Note
OK, so I added a few entries that were not really in this list!
The substitution function¶
Here is the prototype for my new function:
string term_expander(string term);
The function takes in an acronym (with or without the vertical bars (to make it more friendly) and returns a string with the expanded words we should replace this acronym with! To make this function more useful, we will allow the acronym to be given in any case the user wants, so our term could have been any of these:
Austin Community College
Austin Community College
Austin Community College
Austin Community College
If the acronym is unknown, the function should return the acronym without the vertical bar markers in all capital letters.
Phew, sounds hard
Testing the function¶
Here is an odd idea, but one that is a huge part of the programming landscape out there now!
Let’s write a test program that will exercise our new function and tell us if it is working correctly for a number of test cases. The test cases will include both correct and incorrect data, and we will define exactly what we want the function to report for each test we set up..
The test program is not supposed to solve the problem, it is supposed to exercise the test function and make sure it is doing the right thing. It is also supposed to test what the program does with bad input data. For this example, if the term we hand our function is not in the list of expansions, we will simply strip off the vertical bar characters and return the term exactly as it was given by the caller.
Note
The model for this problem is actually the tool I use in writing my lecture notes. It has this feature, and it saves me a bunch of typing!
Step 1 - reading a test file¶
We start this exercise off by creating a program that reads a set of tests from a file. This pattern should look familiar:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char ** argv) {
ifstream test_set; // reads the acronym data set
string acronym;
string test_line;
// open up the test file
test_set.open("test_set.data", ios::in);
if(test_set.is_open()) {
// we have test to try
getline(test_set, test_line);
while(!test_set.eof()) {
cout << test_line << endl;
getline(test_set, test_line);
}
}
test_set.close();
}
And here is a sample test set, in the file test_set.data
Do you see that pattern here? Each line contains a term marked up as needed, followed by the text that the function should return. For our first version of this program, the test code simply reads the file and prints it out, “baby step” style!
|ACC| Austin Community College
|ACC| Austin Community College
|acc| Austin Community College
|aCa| Austin COmmunity COllege
|OOP| OOP
|oop| OOP
|oOp| OOP
Now, we need to add code to hook in out actual test function. In actual
software development, we would write a header file
for our function that
looks like this:
#ifndef TERM_EXPANDER_H
#define TERM_EXPANDER_H
#include <iostream>
using namespace std;
string term_expander(string);
#endif
This file is named term_expander.h
, and the actual function will be stored
in a file named term_expander.cpp
:
That funny notation that surrounds the actual prototype is a pattern used by
all C++ programmers. It keeps the compiler from accidentally trying to include
the same file more than once. Include files can include other include files and
sometimes you get into a loop. The conditional
lines stop this from
happening. You will learn more about all this in another C++ course. For now,
just use the pattern. Note that the actual name stuff you type is the header file
name in all capital letters, with dots replaced by underscores. (This is what
programmers do, so we will also!)
Here is our start on the function. It is clearly wrong, but it will allow out test program to work!
#include <fstream>
#include <iostream>
using namespace std;
#include "term_expander.h"
string term_expander(string term) {
return term;
}
Of course, our function is far from complete!
At this point, I need to make sure Dev-C++ has all the files added in the project. The IDE will make sure the test program works correctly.
Step 2 - Making the test work¶
Our test program is reading our test set file, but not really processing it. We need to break each line up into two parts: the term and the set of words we want to see when the function works. The term is the first thing in the file, and ends when we see whitespace in the line. The set of words is everything from the first non-white space character to the end of the line. Here is code that breaks up each line into these two parts:
#include <iostream>
#include <fstream>
#include <ctype.h>
#include "term_expander.h"
string term, substitution;
using namespace std;
void split_line(string line){
int len = line.length();
int i;
char c;
term = "";
// scan the line ito build term, stopping on white space
for(i = 0; i< len; i++) {
c = line[i];
if(isalpha(c) || c == '|') term.append(1,c);
else break;
}
//now skip any white space
for(;i<len;i++) {
c = line[i];
if(isalpha(c)) break;
}
// finally, copy the rest of the line ito substitution
substitution = "";
for(;i<len;i++) substitution.append(1,line[i]);
}
int main(int argc, char ** argv) {
ifstream test_set; // reads the acronym data set
string acronym;
string test_line;
char c;
// open up the test file
test_set.open("test_set.data", ios::in);
if(test_set.is_open()) {
// we have test to try
getline(test_set, test_line);
while(!test_set.eof()) {
split_line(test_line);
cout << term << " -> " << substitution << endl;
getline(test_set, test_line);
}
}
test_set.close();
}
WOW! That split_line
function I added here looks very complicated.
Actually, it is not that bad. All we are doing is using another simple C++
library function called isalpha
which returns true
if the character you
have it is a letter (alphabetic character). It returns false
otherwise. I
want to copy all of the letters and the vertical bar characters into the
term
variable until I find a character that is not a letter of bar. At that
point, I have found the term, and I need to skip the rest of the white space
until I find another letter. From that point to the end of the line, I just
need to copy the test into the substitution
variable. What looks strange in
this code is the fact that I left off the initialization code in the for
loops
for the second and third loops in this routine. Why?
Well, I an using the variable i
to count my way across the entire line.
After the first loop, i
is pointing to the first white space character in
the line. I do not need to change that value, so I simply do nothing as the
second loop starts up. I will let it run until I find a non-while space
character, then use break
to stop this loop. AT that point, i
is
pointing to the first letter of the substitution, which runs to the end of the
line. The last for loop just copies everything left in the line into the
substitution
variable! See, that was not so bad.
Note
Once again, as you see and study code fragments like this, you will store this pattern away in your mind for later use in other programs. The more code you examine, the better your will become. What you are really doing is training your brain to think things through at a fine level of detail, and thinking hard to make sure this all works.
OMG! I am testing my test code!
The output now shows that I am breaking up my test set lines as needed:
|ACC| -> Austin Community College
|ACC| -> Austin Community College
|acc| -> Austin Community College
|aCa| -> Austin COmmunity COllege
|OOP| -> OOP
|oop| -> OOP
|oOp| -> OOP
Step 3 - Making the test report what it finds¶
Now, I am going to make this test code report success or failure!. The idea is to call the test function and get its return value. We want to compare what we get to the substitution string our test_set.data`` file says we should get. We will report failures, and just count successes for a final report:
#include <iostream>
#include <fstream>
#include <ctype.h>
#include "term_expander.h"
string term, substitution;
using namespace std;
void split_line(string line){
int len = line.length();
int i;
char c;
term = "";
// scan the line ito build term, stopping on white space
for(i = 0; i< len; i++) {
c = line[i];
if(isalpha(c) || c == '|') term.append(1,c);
else break;
}
//now skip any white space
for(;i<len;i++) {
c = line[i];
if(isalpha(c)) break;
}
// finally, copy the rest of the line ito substitution
substitution = "";
for(;i<len;i++) substitution.append(1,line[i]);
}
int main(int argc, char ** argv) {
ifstream test_set; // reads the acronym data set
string acronym;
string test_line;
string result;
char c;
int tests_passed = 0;
int tests_failed = 0;
// open up the test file
test_set.open("test_set.data", ios::in);
if(test_set.is_open()) {
// we have test to try
getline(test_set, test_line);
while(!test_set.eof()) {
split_line(test_line);
// test the function here:
result = term_expander(term);
if(result == substitution) tests_passed++;
else {
cout << "Fail! " << result << " does not match " << substitution << endl;
tests_failed++;
}
getline(test_set, test_line);
}
}
test_set.close();
cout << tests_passed << "tests passed" << endl;
cout << tests_failed << "tests failed" << endl;
}
Look closely at how this works! The term we extracted from each test set line
is handed to the term_expander
function, and the result we get back is
stored in result
. We then compare that result
with the string we found
on the test set line, which is stored in substitution
. If they match, we
add one to the counter indicating tests passed. If they do not match, we report
the error, and add one to the tests failed counter. AT the end of the program
we report the final results. Here is what I saw now:
Fail! |ACC| does not match Austin Community College
Fail! |ACC| does not match Austin Community College
Fail! |acc| does not match Austin Community College
Fail! |aCa| does not match Austin COmmunity COllege
Fail! |OOP| does not match OOP
Fail! |oop| does not match OOP
Fail! |oOp| does not match OOP
0tests passed
7tests failed
Well, that is no good, all the tests failed.
Test Driven Development¶
What we just did was to build a test fixture we can use to test out
term_expander
function as we work on it. Our goal is to make all the tests
pass. Every time we add something to the function, we hope we are moving closer
to a finished product. Out tests will tell us if that is so. We can always add
more tests to the test_set.data
file as we think them up!
This is the real way software is being developed in huge projects today. Obviously, this is beyond what you could have done earlier in the course, but you could start using this approach now that you are starting to learn what all this programming stuff is all about. Have fun. This actually makes programming a lot more satisfying. Added to my “Baby Step” approach, and you will find programming can be a lot of fun! (Even if it is hard work at the same time!)