Reading strings from a file

Well, the previous lecture implied that we might be able to read in an entire line in one shot with something like:

#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, char *argv[])
{
    char * data_file = "test.data";
    char * cooked_data = "processed.data";
    
    ifstream inFile;
    ofstream outFile;

    string item;
    int cookedData;;
    
    cout << "Processing file data" << endl;
    inFile.open(data_file, ios::in);
    if(inFile.is_open()) {
        cout << "File was opened" << endl;
        outFile.open(cooked_data, ios::out);
        if (outFile.is_open()) {
            cout << "ready for cooked data" << endl;
            inFile >> item;
            while(! inFile.eof()) {
                cout << "data = " << item << endl;
                cookedData = item.length();
                outFile << cookedData << endl;
                inFile >> item;
            }
            outFile.close();
        } else
            cout << cooked_data << " could not be opened" << endl;
        inFile.close();
    } else
        cout << data_file << " could not be opened." << endl;
}

The problem is that the input works like it did for numbers, one “chunk” at a time, not the entire line. Here is a sample data file:

This is a test
of the emergency action system

And here is the output we get:

Processing file data
File was opened
ready for cooked data
data = This
data = is
data = a
data = test
data = of
data = the
data = emergency
data = action
data = system

Clearly, this will help in your lab project, but what if we really do want to read the entire line in.

Doing this involves using a different method:

#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, char *argv[])
{
    char * data_file = "test.data";
    char * cooked_data = "processed.data";
    
    ifstream inFile;
    ofstream outFile;
    string item;
    int cookedData;
    string stuff;
    
    cout << "Processing file data" << endl;
    inFile.open(data_file, ios::in);
    if(inFile.is_open()) {
        cout << "File was opened" << endl;
        outFile.open(cooked_data, ios::out);
        if (outFile.is_open()) {
            cout << "ready for cooked data" << endl;
            getline(inFile, item);
            while(! inFile.eof()) {
                cout << "data = " << item << endl;
                cookedData = item.length();
                outFile << cookedData << endl;
                getline(inFile,item);
            }
            outFile.close();
        } else
            cout << cooked_data << " could not be opened" << endl;
        inFile.close();
    } else
        cout << data_file << " could not be opened." << endl;
}

And running this gives us this output:

Processing file data
File was opened
ready for cooked data
data = This is a test
data = of the emergency action system

Now we have a full line of text and processing this one is more difficult. (Guess which approach I would recommend that you use for your lab project!)

Writing small test programs like this is a very good way to test code snippets and make sure your logic is working correctly, before adding something to your project.

Let’s take the first version of this code and do something we probably should have been doing all along. When we write a new function, we really ought to bolt that new function into a test fixture and feed it a bunch of data to make sure we are happy with the result. Since we are studying files, I will show how to build up a testing routine that can exercise a function we have constructed and want to add to our project.

To make this more interesting, we will build the test fixture code in a separate file and have the linker hook in our test function.

Testing a new function

Well, to do this, we need a function to test. I will keep this moderately simple!

Suppose we are processing a bunch of text, maybe something like my course note files. My files are a set of lines, each of which may have one or more acronym in it somewhere. The acronym will be marked off by surrounding it with vertical bars. The test line might look like this:

I am currently working at |ACC| as a Professor of |CS|.

My function is going to help me expand the acronyms I have marked into the set of words associated with each one. In this example, what I really want to have in my notes is this:

I am currently working at Austin Community College as a Professor of Computer Science.

All I have done here is expand the acronym using the text I really want.

I need some test data for this exercise, so I Googled “computer acronyms” and came up with this set of terms and phrases that could be substituted whenever the marked up term is found.This is a sample extract from that search:

ACC     Austin Community College
ACL     Access Control List
ADC     Analog-to-Digital Converter
ADF     Automatic Document Feeder
ADSL    Asymmetric Digital Subscriber Line
AGP     Accelerated Graphics Port
AIFF    Audio Interchange File Format
AIX     Advanced Interactive Executive
ALU     Arithmetic Logic Unit
ANSI    American National Standards Institute
API     Application Program Interface
APU     Accelerated Processing Unit
ARP     Address Resolution Protocol
ASCII   American Standard Code for Information Interchange
ASP     Active Server Page
ATA     Advanced Technology Attachment
ATM     Asynchronous Transfer Mode
Bash    Bourne-Again Shell
BASIC   Beginner's All-purpose Symbolic Instruction Code
Bcc     Blind Carbon Copy
BIOS    Basic Input/Output System
Blob    Binary Large Object
BMP     Bitmap
BSOD    Blue Screen of Death
CAD     Computer-Aided Design
Cc      Carbon Copy
CCD     Charged Coupled Device
CD      Compact Disc
CD-R    Compact Disc Recordable
CD-ROM  Compact Disc Read-Only Memory
CD-RW   Compact Disc Re-Writable
CDFS    Compact Disc File System
CDMA    Code Division Multiple Access
CS      Computer Science

Note

OK, so I added a few entries that were not really in this list!

The substitution function

Here is the prototype for my new function:

string term_expander(string term);

The function takes in an acronym (with or without the vertical bars (to make it more friendly) and returns a string with the expanded words we should replace this acronym with! To make this function more useful, we will allow the acronym to be given in any case the user wants, so our term could have been any of these:

  • Austin Community College
  • Austin Community College
  • Austin Community College
  • Austin Community College

If the acronym is unknown, the function should return the acronym without the vertical bar markers in all capital letters.

Phew, sounds hard

Testing the function

Here is an odd idea, but one that is a huge part of the programming landscape out there now!

Let’s write a test program that will exercise our new function and tell us if it is working correctly for a number of test cases. The test cases will include both correct and incorrect data, and we will define exactly what we want the function to report for each test we set up..

The test program is not supposed to solve the problem, it is supposed to exercise the test function and make sure it is doing the right thing. It is also supposed to test what the program does with bad input data. For this example, if the term we hand our function is not in the list of expansions, we will simply strip off the vertical bar characters and return the term exactly as it was given by the caller.

Note

The model for this problem is actually the tool I use in writing my lecture notes. It has this feature, and it saves me a bunch of typing!

Step 1 - reading a test file

We start this exercise off by creating a program that reads a set of tests from a file. This pattern should look familiar:

#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, char ** argv) {
    ifstream test_set;   // reads the acronym data set
    string acronym;
    string test_line;

    // open up the test file
    test_set.open("test_set.data", ios::in);
    if(test_set.is_open()) {
        // we have test to try
        
        getline(test_set, test_line);
        while(!test_set.eof()) {
            cout << test_line << endl;
            getline(test_set, test_line);
        }
    }
    test_set.close();
}


And here is a sample test set, in the file test_set.data

Do you see that pattern here? Each line contains a term marked up as needed, followed by the text that the function should return. For our first version of this program, the test code simply reads the file and prints it out, “baby step” style!

|ACC|   Austin Community College
|ACC|   Austin Community College
|acc|   Austin Community College
|aCa|   Austin COmmunity COllege
|OOP|   OOP
|oop|   OOP
|oOp|   OOP

Now, we need to add code to hook in out actual test function. In actual software development, we would write a header file for our function that looks like this:

#ifndef TERM_EXPANDER_H
#define TERM_EXPANDER_H

#include <iostream>
using namespace std;

string term_expander(string);

#endif

This file is named term_expander.h, and the actual function will be stored in a file named term_expander.cpp:

That funny notation that surrounds the actual prototype is a pattern used by all C++ programmers. It keeps the compiler from accidentally trying to include the same file more than once. Include files can include other include files and sometimes you get into a loop. The conditional lines stop this from happening. You will learn more about all this in another C++ course. For now, just use the pattern. Note that the actual name stuff you type is the header file name in all capital letters, with dots replaced by underscores. (This is what programmers do, so we will also!)

Here is our start on the function. It is clearly wrong, but it will allow out test program to work!

#include <fstream>
#include <iostream>
using namespace std;

#include "term_expander.h"

string term_expander(string term) {
    return term;
}

Of course, our function is far from complete!

At this point, I need to make sure Dev-C++ has all the files added in the project. The IDE will make sure the test program works correctly.

Step 2 - Making the test work

Our test program is reading our test set file, but not really processing it. We need to break each line up into two parts: the term and the set of words we want to see when the function works. The term is the first thing in the file, and ends when we see whitespace in the line. The set of words is everything from the first non-white space character to the end of the line. Here is code that breaks up each line into these two parts:

#include <iostream>
#include <fstream>
#include <ctype.h>
#include "term_expander.h"

string term, substitution;

using namespace std;

void split_line(string line){
    int len = line.length();
    int i;
    char c;
    term = "";
    // scan the line ito build term, stopping on white space
    for(i = 0; i< len; i++) {
        c = line[i];
        if(isalpha(c) || c == '|') term.append(1,c);
        else break;
    }
    //now skip any white space
    for(;i<len;i++) {
        c = line[i];
        if(isalpha(c)) break;
    }
    // finally, copy the rest of the line ito substitution
    substitution = "";
    for(;i<len;i++) substitution.append(1,line[i]);
}

int main(int argc, char ** argv) {
    ifstream test_set;   // reads the acronym data set
    string acronym;
    string test_line;
    char c;

    // open up the test file
    test_set.open("test_set.data", ios::in);
    if(test_set.is_open()) {
        // we have test to try
        
        getline(test_set, test_line);
        while(!test_set.eof()) {
            split_line(test_line);
            cout << term << " -> " << substitution << endl;
            getline(test_set, test_line);
        }
    }
    test_set.close();
}


WOW! That split_line function I added here looks very complicated. Actually, it is not that bad. All we are doing is using another simple C++ library function called isalpha which returns true if the character you have it is a letter (alphabetic character). It returns false otherwise. I want to copy all of the letters and the vertical bar characters into the term variable until I find a character that is not a letter of bar. At that point, I have found the term, and I need to skip the rest of the white space until I find another letter. From that point to the end of the line, I just need to copy the test into the substitution variable. What looks strange in this code is the fact that I left off the initialization code in the for loops for the second and third loops in this routine. Why?

Well, I an using the variable i to count my way across the entire line. After the first loop, i is pointing to the first white space character in the line. I do not need to change that value, so I simply do nothing as the second loop starts up. I will let it run until I find a non-while space character, then use break to stop this loop. AT that point, i is pointing to the first letter of the substitution, which runs to the end of the line. The last for loop just copies everything left in the line into the substitution variable! See, that was not so bad.

Note

Once again, as you see and study code fragments like this, you will store this pattern away in your mind for later use in other programs. The more code you examine, the better your will become. What you are really doing is training your brain to think things through at a fine level of detail, and thinking hard to make sure this all works.

OMG! I am testing my test code!

The output now shows that I am breaking up my test set lines as needed:

|ACC| -> Austin Community College
|ACC| -> Austin Community College
|acc| -> Austin Community College
|aCa| -> Austin COmmunity COllege
|OOP| -> OOP
|oop| -> OOP
|oOp| -> OOP

Step 3 - Making the test report what it finds

Now, I am going to make this test code report success or failure!. The idea is to call the test function and get its return value. We want to compare what we get to the substitution string our test_set.data`` file says we should get. We will report failures, and just count successes for a final report:

#include <iostream>
#include <fstream>
#include <ctype.h>
#include "term_expander.h"

string term, substitution;

using namespace std;

void split_line(string line){
    int len = line.length();
    int i;
    char c;
    term = "";
    // scan the line ito build term, stopping on white space
    for(i = 0; i< len; i++) {
        c = line[i];
        if(isalpha(c) || c == '|') term.append(1,c);
        else break;
    }
    //now skip any white space
    for(;i<len;i++) {
        c = line[i];
        if(isalpha(c)) break;
    }
    // finally, copy the rest of the line ito substitution
    substitution = "";
    for(;i<len;i++) substitution.append(1,line[i]);
}

int main(int argc, char ** argv) {
    ifstream test_set;   // reads the acronym data set
    string acronym;
    string test_line;
    string result;
    char c;
    int tests_passed = 0;
    int tests_failed = 0;

    // open up the test file
    test_set.open("test_set.data", ios::in);
    if(test_set.is_open()) {
        // we have test to try
        
        getline(test_set, test_line);
        while(!test_set.eof()) {
            split_line(test_line);
            // test the function here:
            result = term_expander(term);
            if(result == substitution) tests_passed++;
            else {
                cout << "Fail! " << result << " does not match " << substitution << endl;
                tests_failed++;
            }
            getline(test_set, test_line);
        }
    }
    test_set.close();
    cout << tests_passed << "tests passed" << endl;
    cout << tests_failed << "tests failed" << endl;
}


Look closely at how this works! The term we extracted from each test set line is handed to the term_expander function, and the result we get back is stored in result. We then compare that result with the string we found on the test set line, which is stored in substitution. If they match, we add one to the counter indicating tests passed. If they do not match, we report the error, and add one to the tests failed counter. AT the end of the program we report the final results. Here is what I saw now:

Fail! |ACC| does not match Austin Community College
Fail! |ACC| does not match Austin Community College
Fail! |acc| does not match Austin Community College
Fail! |aCa| does not match Austin COmmunity COllege
Fail! |OOP| does not match OOP
Fail! |oop| does not match OOP
Fail! |oOp| does not match OOP
0tests passed
7tests failed

Well, that is no good, all the tests failed.

Test Driven Development

What we just did was to build a test fixture we can use to test out term_expander function as we work on it. Our goal is to make all the tests pass. Every time we add something to the function, we hope we are moving closer to a finished product. Out tests will tell us if that is so. We can always add more tests to the test_set.data file as we think them up!

This is the real way software is being developed in huge projects today. Obviously, this is beyond what you could have done earlier in the course, but you could start using this approach now that you are starting to learn what all this programming stuff is all about. Have fun. This actually makes programming a lot more satisfying. Added to my “Baby Step” approach, and you will find programming can be a lot of fun! (Even if it is hard work at the same time!)