Linked Lists

As we have seen in our work so far, building flexible places to store program data can get complicated. With the introduction of pointers, we have a way to ask the system for a chunk of memory while the program is running, and use that memory to store things. Obviously, that chunk of memory will not have a name, but it will have an address, and we can store that in a pointer variable! This new concept is called “dynamic storage”.

By introducing the concept of dynamic data storage, we at last have the ability to build containers as the program is running, freeing us from compile time constraints.

What is a linked list?

Suppose we build a container to hold a single piece of data, like a simple integer. Further, suppose we need to store a lot of these integers while our program runs, and we will not know how many until the program runs! What kind of data container would you build?

The first thought is probably just a big array. That may work, unless the number of integers gets bigger than you set the array up to hold!

Now suppose I want to store another piece of information along with that integer, maybe even a lot of pieces of information - would an array still work?

Sure, we just build an object that holds all the required data and create an array of these objects. No problem!

Now, suppose we need to rearrange this data many times as the program runs. The code we have studied so far will require copying data from one place to another in the array to get this job done. Could we avoid all of the copying?

Yes! We can build the data containers for each object dynamically, and let each container hold a pointer to the next object in the list of containers we create.

A Linked List is just a sequence of objects connected by pointers that are stored within each object. These pointers point to the next object in the list. The list has a sequential nature to it. We will need one pointer variable that holds the address of the first object in the list, and we can use that pointer to find that object. After we find the first object, we can find the next object by looking inside the first object for the pointer that is tied to the next object. When we get to the last object in the list, this internal pointer will be set to NULL meaning there is no next object!

Here is what it looks like:

../_images/LinkedList.png

With a bit of careful coding, we can make this data structure grow and shrink as needed while our program runs.

Advantages of linked lists

There are a number of reasons why programmers like linked lists:

  • No set limit on how big (or small) they can be
  • Easy (well not exactly hard) to add items to the list (even in the middle)
  • Easy (well, again not that hard) to delete items from the list

Disadvantages of linked lists

However, there are some drawbacks

  • Not as easy to get to a particular item in the list (no index)
  • Pointer management is difficult to do right!

In spite of these problems, you do need to know how to build data structures like this, so let’s give it a try:

Building a Linked List

Let’s start off on our exploration of linked lists by building a simple application that manages a set our names we get from the user. Here is a main program to get us started:

main.cpp:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <iostream>
using namespace std;

int main(int argc, char ** argv) {

    string name;

    while(true) {
        cout << "Enter name ('quit' to end): ";
        cin >> name;
        if (name == "quit") break;
    }
}

This simple code prompts the user for a name then loops until the entered name is “quit”. Here is this code in action:

Here is a sample run:

Enter name ('quit' to end):fred
inserting fred
Enter name ('quit' to end):barney
inserting barney
Enter name ('quit' to end):betty
inserting betty
Enter name ('quit' to end):wilma
inserting wilma
Enter name ('quit' to end):quit
Press any key to continue . . .

Building nodes

Now that we have a working program, we need to build a container to hold the names we enter. What kind of objects will we need?

Let’s look at some terminology first:

  • A Node is a basic container with data and a link to another node in it
  • A Link is just a pointer to a node.
  • A Head pointer points to the first node in the list

The node is the most important object in this list gadget. We need to create a node for each piece of data we want to place in our linked list. So, one class should set up nodes. We also need another object to manage the list. This is exactly like the problem we faced in the poolball problem. We used one class to act like a poolball, and another class to manage the collection of poolballs. So, we will have a second class to manage the entire list.

We will need to build these two classes to store a linked list of names:

Node Class

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#ifndef NODE_H
#define NODE_H

#include <string>
using namespace std;

class Node {
    public:
        string name;
        Node * next;
        Node() { name = ""; next = NULL; }
        Node( string val, Node * nextNode) { name = val, next = nextNode; }
};

#endif

Node objects will store a single string for the name, and a single pointer to the next node in the list. The class also includes two constructors, one for a simple empty node, and one that will be used to initialize the instance variables.

Building the Linked List class

Now that we have a node defined, we can set up the management class:

Linked List class

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#ifndef LINKEDLIST_H
#define LINKEDLIST_H

#include "Node.h"

#include <string>
using namespace std;

class LinkedList {
    public:
        Node * head;
        LinkedList() { head = NULL; }
        void insert( string val);
        void print(void);
};

#endif

We need a pointer to keep track of our series of nodes, and we use the head pointer to do just that. We have provided a constructor which creates a new (empty) linked list, and methods to add items to the list, and print the list out. We need to write these last two methods before we can run our program and store the names.

Inserting names in the list

The first of these routines will add a new name to the front of the current list:

Insert method:
1
2
3
void LinkedList::insert( string val ) {
    head = new Node(val, head);
}

What is going on here? The insert method is supposed to add the new name to the front of the list. Before we can do that, we need a place to store the new name.

Creating Dynamic Storage

The C++ operator new does the job of creating new memory space for us. The operator is followed by the name of an existing data type, or the name of a class you have created. The memory needed to store one object from the designated data type (or class) is set up in a memory area called the heap, and you get the address of that location back. All you need to do is store that address somewhere.

tmp = new Node();   // creates one bare (uninitialized) Node object

Deleting Dynamic Storage

If you need to get rid of a chunk of memory you previously created with new, you use the delete operator:

delete tmp;         // returns the storage allocated to the heap

Storing the Address

Line 10 in the insert method above actually stores the address, and does so in a safe way. We took the current pointer to the head of the list and passed it into the Node constructor so that when the new operator did its magic, the new node was properly initialized with the old list pointer stored inside. At that point taking the new address new gives us and storing that into out head variable reconnects head to the front of the new list.

You need to be careful in these kinds of operations, or things can go wrong quickly!

What would happen if we used the following code:

Node * hook;
hook = new Node();
head = hook;
cout << head -> name << end;

Remember the rope analogy? In the third line above, you cut off the old rope, and lost whatever list you used to have connected there. One solution to that is to make sure you have another pointer variable pointing to the old list when you replace the address in head with a new address.

Noder * hook;
Node * save = head;
hook = new Node();
save = head;
head = hook;
head -> next = save;

Accidentally losing the address of some data structure constructed in dynamic memory is a common mistake, and one you need to think about as you work with pointers! Draw a picture and make sure you see how this worked! What happens if the list is empty, or if it has other names in the list?

Printing out the list

The pattern you will see in this routine is very common in linked data structures.:

1
2
3
4
5
6
7
void LinkedList::print() {
    Node *ptr = head;
    while (ptr != NULL) {
        cout << ptr->name << endl;
        ptr = ptr->next;
    }
}

Remember that the funny “->” notation means “follow the link, then look inside the box for this inner container”.

With these two routines in place, we can set up our program:

main.cpp:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#include "Node.h"
#include "LinkedList.h"

#include <iostream>
using namespace std;

int main(int argc, char *argv[]) {
    LinkedList myList;
    string name;
    
    while (true) {
        cout << "Enter name ('quit' to end):";
        cin >> name;
        if (name == "quit") break;
        myList.insert(name);
        myList.print();
    }
    myList.print();
}

We have created a LinkedList variable to keep track of the new list. The constructor for this class makes sure this new variable is pointing to an empty list. Next, we read the names from the user as before, but now we insert them into the linked list. If all goes well, we print the list out. This is what you should see:

sample run

Enter name ('quit' to end):fred
inserting fred
Enter name ('quit' to end):wilma
inserting wilma
Enter name ('quit' to end):barney
inserting barney
Enter name ('quit' to end):betty
inserting betty
Enter name ('quit' to end):quit
betty
barney
wilma
fred
Press any key to continue . . .

Notice the order we see here. Since the insert method always inserts at the front of the list, we will see the items last to first.

Hiding Details

There is one significant problem with the code as we have shown it so far. All of the internal details about how things are working are visible to users, and they can (and will) break things if they misuse these variables. It would be better to hide the details behind the private barrier!

Here is our new main.cpp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#include "Node.h"
#include "LinkedList.h"

#include <iostream>
using namespace std;

int main(int argc, char *argv[]) {
    LinkedList myList;
    string name;
    
    while (true) {
        cout << "Enter name ('quit' to end):";
        cin >> name;
        if (name == "quit") break;
        myList.insert(name);
        myList.print();
    }
    myList.print();
}

Nere are the new class header files needed to implement these changes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#ifndef NODE_H
#define NODE_H

#include <string>
using namespace std;

class Node {
    private:
        string name;
        Node * next;
    public:
        // constructors
        Node() { name = ""; next = NULL; }
        Node( string val, Node * nextNode) { name = val, next = nextNode; }

        // accessors
        string getName(void);
        Node * nextPtr(void);

        // mutators
        void setName(string);
};

#endif

And the Management class header

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#ifndef LINKEDLIST_H
#define LINKEDLIST_H

#include "Node.h"

#include <string>
using namespace std;

class LinkedList {
    private:
        Node * head;
    public:
        // constructor
        LinkedList() { head = NULL; }

        // accessors
        void print(void);

        // mutators
        void insert( string val);
};

#endif

Creating the implementation files is an easy change, so that will not be shown here

Creating an ordered list

Could we make a smarter insert method, like one that keeps the names in alphabetical order? Let’s see:

Alphabetic insert

void LinkedList::alphaInsert(string val) {
}

Now we need to work out how to insert a node at a particular place. This is a bit complicated, so we will work it out in small steps!

What are the possibilities we might need to consider?

Empty list

First, the list might be empty. This one is easy, we insert the name into the empty list. We can tell if it is empty by checking the value of the head node:

add to an empty list

if (head == NULL) insert(val);
Item goes in front

Next, the item could go in front of the first item. Again, we can use the insert method to do this. We can combine these two cases to get:

add to front of a possibly empty list

if (head == NULL || val < head->getName()) insert(val)

Make sure you see how this expression works. Remember that C++ evaluates any logical expression as far as necessary to get the answer and no further. So, if the head pointer is NULL, the first part of the expression will return a true value, and we know that the entire expression is true without evaluating the rest of the expression. In this case, this is good, since evaluating the second part would generate a reference through a NULL pointer and crash!

This is called a short circuit expression, we stop evaluating (short circuit -er- cause the evaluator to stop) before we get into trouble. Again, this is a common pattern in working with pointers.

General case

Now, we need to work out the hard part. In general, the item will need to go after one item and in front of a second item. We will make sure the item does not go in front of the first node by dealing with that possibility before we get to this part of the code. Since we will need to hang onto multiple nodes here, we will need more than one pointer variable. Let’s set up two, one for the current node, and one for the next node. Here is a loop that can work through all the nodes in the list keeping these two pointers set right:

Walking through the list with two pointers

Node *currentNode, *nextNode;

currentNode = head;
while (currentNode != NULL) {
    nextNode = currentNode->getNext()
    ...
    currentNode = nextNode;
}

Do you see how it works? We make sure the current node is not NULL as the loop control - stopping after we try to go past the last node in the list. We set the nextNode pointer to the value in the next variable. This will be the next node in the list, or NULL when the currentNode is the last item in the list.

Now, inside the loop, we have our two pointers. However, we have two possibilities to consider. The general case happens when both pointers are not NULL, meaning we have two nodes in the list. We need to check to see if the new name goes between these two nodes in this case.

In the second case, the nextNode pointer is NULL meaning we are at the end of the list. We will hook the new name after this node in this case.

End of list case

Here, we create our new node and hook it to the end of the list. We make sure that the list still has a proper NULL at the end by setting this up in the call to the node constructor. Notice how we bail out of the routine when we find the place for the new node. We will use that scheme in the rest of the code as well.

We can shorten this up like so:

End of list case

if (nextNode == NULL) {
    currentNode->setNext(new Node(val,NULL));
    return;

This avoids needing another variable and still keeps the list in proper shape.

At last, we get to the general case. Here we will hook the new node in between the two other nodes all the while making sure we do not let go of any of the rope!

general case

This code checks to see if the name goes in front of the next node, and inserts it between the two nodes if so.

Here is the complete routine:

LinkedList.cpp:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// LinkedList.cpp - Linked List management node

#include "LinkedList.h"
#include "Node.h"

#include <iostream>
using namespace std;

void LinkedList::insert( string val ) {
    head = new Node(val, head);
}

void LinkedList::alphaInsert(string val) {
    Node * currentNode, * nextNode;

    // add to empty list, or as the first item
    if (head == NULL || val < head->getName()) {
        insert(val);
        return;
    }

    // add to existing list between current and next nodes
    currentNode = head;
    while( currentNode != NULL) {
        nextNode = currentNode->nextPtr();

        if(nextNode == NULL) {
            // we have reached the end, add to the end
            currentNode->setNext(new Node(val, NULL));
            return;
        }
        // general case, check if we insert before nextNode
        if( val < nextNode->getName()) {
            Node * newNode = new Node(val, nextNode);
            currentNode->setNext(newNode);
            return;
        }
        // on to the next node to check
        currentNode = nextNode;
    }
}

void LinkedList::print() {
    Node *ptr = head;
    while (ptr != NULL) {
        cout << ptr->getName() << endl;
        ptr = ptr->nextPtr();
    }
}

And this is how it works:

sample run

Enter name ('quit' to end):fred
inserting fred
Enter name ('quit' to end):barney
inserting barney
Enter name ('quit' to end):wilma
inserting wilma
Enter name ('quit' to end):betty
inserting betty
Enter name ('quit' to end):quit
barney
betty
fred
wilma
Press any key to continue . . .

While this one was much harder to implement, it demonstrates the kind of thinking you need to do to keep these linked data structures under control.

Removing a name from the list

Now, we can deal with the removal of an item from the list. The code for this routine needs to find the item (which may not be in the list) and then unhook it. Once again, we have a number of cases to consider.

Remove an item from the list

To make our LinkedList class more useful, we need to add a method that removes items from the list. Here is the new prototype:

        void remove(string val);

We have several situations to consider in writing this method’s implementation. Here is the outer structure:

1
2
3
void LinkedList::remove(string val) {
    Node * currentNode, * nextNode;
}

The list is empty

If the list is empty, we have nothing to do:

1
2
    // check if list is empty
    if( head == NULL) return;

Item is in front

If the item is in the front of the list, the procedure is pretty simple again:

1
2
3
4
5
6
7
8
    // is the item in front?
    if( head->getName() == val) {
        cout << "removing " << val << " from list front" << endl;
        nextNode = head->nextPtr();
        delete head;
        head = nextNode;
        return;
    }

General case

Once again, we need to set up a loop that walks down the list finding the item in question. We want to keep track of the previous node, so we search for a match using the nextNode pointer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    // general case, find the string in the next node
    currentNode = head;
    while(currentNode != NULL) {
        nextNode=currentNode->nextPtr();
        if( nextNode->getName() == val) {
            // hang on to the list after nextNode
            cout << "removing " << val << " from the list" << endl;
            Node *ptr = nextNode->nextPtr();
            delete nextNode;
            // hook the old list to the last list
            currentNode->setNext(ptr);
            return;
        }
        currentNode = nextNode;
    }

We have a problem here. As with the alphabetic insert logic we discussed above, we might have reached the end of the list when we check for the name in the next node. In that case, nextNode will be NULL. If it is, that means we have reached the end of the line and the name we are after must not be in the list - so we have nothing to do. We can check for this before we compare strings

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    if( head->getName() == val) {
        cout << "removing " << val << " from list front" << endl;
        nextNode = head->nextPtr();
        delete head;
        head = nextNode;
        return;
    }
    // general case, find the string in the next node
    currentNode = head;
    while(currentNode != NULL) {
        nextNode=currentNode->nextPtr();
        if( nextNode == NULL) {
            cout << "name " << val <<" is not in the list" << endl;
            return; // item is not in the list
        }

Here is a modification to our main code to test this:

main.cpp:
...
while (true) {
    cout << "Enter name ('quit' to end):";
    cin >> name;
    if (name == "quit") break;
    myList.alphaInsert(name);
}
myList.print();
myList.remove("fred");
myList.print();
myList.remove("linda");
myList.print();
myList.remove("barney");
myList.print();
myList.remove("wilma");
myList.print();

And here is what we get running this code:

sample run

$ ./ex5
Enter name ('quit' to end):fred
Enter name ('quit' to end):wilma
Enter name ('quit' to end):barney
Enter name ('quit' to end):dino
Enter name ('quit' to end):quit
barney
dino
fred
wilma
removing fred from the list
barney
dino
wilma
name linda is not in the list
barney
dino
wilma
removing barney from list front
dino
wilma
removing wilma from the list
dino

Now, we are making progress. We have a set of routines that will let us build a list of names, add new names and delete old names. Pretty good. But, can we do better?

Doubly linked lists

In the previous example, our linked list is one way. That means we can only walk the list in one direction, from the first node to the last node. What if we want to be able to walk both ways? Can we do that?

Sure, just maintain two pointers in each node - one to hook to the node after the current node, and another to hook to the node before the current node. Our class could handle the list this way with the user of our linked list not even knowing we had new features in the class.

Here is the new Node class:

Node.h:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#ifndef NODE_H
#define NODE_H

#include <string>
using namespace std;

class Node {
    private:
        string name;
        Node * next;
        Node * prev;
    public:
        // constructors
        Node() { name = ""; next = NULL; }
        Node( string val, Node * nextNode, Node * prevNode) { 
            name = val, next = nextNode; prev = prevNode; }

        // accessors
        string getName(void) { return name; }
        Node * nextPtr(void) { return next; }
        Node * prevPtr(void) { return prev; }

        // mutators
        void setName(string val) { name = val; }
        void setNext(Node * ptr) { next = ptr; }
        void setPrev(Node * ptr) { prev = ptr; }
};

#endif

Note

Since the code for the methods in the Node class was so simple, it is common to just add that code into the header file and eliminate the implementation file. We are doing this here, so delete the Node.cpp file if it exists. Remember to update your Makefile if you do this.

And our new linked list class:

LinkedList.h:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#ifndef LINKEDLIST_H
#define LINKEDLIST_H

#include "Node.h"

#include <string>
using namespace std;

class LinkedList {
    private:
        Node * head;
        Node * tail;
        void insertFront(string val);
        void insertEnd(string val);
    public:
        // constructor
        LinkedList() { head = NULL; tail = NULL; }

        // accessors
        void print(void);
        void printReverse(void);

        // mutators
        void alphaInsert(string val);
        void remove(string val);
};

#endif

This definition did not change much. The class will keep two pointers to the list, one pointing to the front of the list, and one pointing to the end of the list. Both of these pointers will be NULL if the list is empty.

There are a few new routines here so we can play with this new kind of linked list.

Modifications to the LinkedList methods

We have no other changes to make to the Node class, but our methods in the LinkedList class need to be worked on.

Constructor

The basic constructor is not changed, except for the need to initialize the new tail pointer.

LinkedList::LinkedList() {name = ""; head = NULL; tail = NULL; }
Insert at front of the list

With the addition of the new pointer, inserting at the front is a bit more complicated. Here, we need to consider a few more cases:

insert in the front of the list:
 
void LinkedList::insertFront(string val) {
    if (head == NULL) {
        head = new Node(val,NULL,NULL);
        tail = head;
    } else {
        Node * ptr = new Node(val,head,NULL);
        head->setPrev(ptr);
        head = ptr;
    }
}

You should draw a few pictures to make sure you can follow what is going on here. We are trying to maintain two chains here. So we have to make sure both are consistent when we get done. Each case takes thinking to make sure you have it right! Don’t get discouraged if you break a few lists as you learn - we all have!

Remove item

This method is another that gets a bit more complicated. Again, we need to carefully think through all the possibilities. Let’s try to write them down before we build the code:

  • List is empty - no work to do
  • List has one node
  • Item is not in the list - no work to do
  • Item is in the list - delete the item and make the list empty
  • List has two or more nodes
  • Item is not in the list - no work
  • Item is in the list
  • Item is in front of the list
  • Item is at the end of the list
  • Item is in the middle of the list

Each case needs to be thought through individually.

Here is the final remove routine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
void LinkedList::remove(string val) {
    Node * currentNode, * nextNode, * prevNode;

    // check if list is empty
    cout << "deleting " << val << endl;
    
    // see if the list is empty
    if( head == NULL) return;

    // no? Got work to do
    currentNode = head;
    prevNode = NULL;
    nextNode = head->nextPtr();

    // see if list has one item
    if(nextNode == NULL) {
        cout << "deleting only node in list" << endl;
        delete currentNode;
        head = tail = NULL;
        return;
    }

    // is the item in front?
    if( head->getName() == val) {
        cout << "removing " << val << " from list front" << endl;
        delete head;
        head = nextNode;
        head->setPrev(NULL);
        return;
    }
    // general case, find the string in the next node
    while(currentNode != NULL) {
        nextNode = currentNode->nextPtr();
        cout << "testing " << currentNode->getName() << endl;
        if(currentNode->getName() == val) {
            nextNode=currentNode->nextPtr();
            prevNode = currentNode->prevPtr();
            cout << "deleting " << currentNode->getName() << endl;
            if( nextNode == NULL) {
                // deleting the last item in the list
                delete currentNode;
                prevNode->setNext(NULL);
                return;
            }

            delete currentNode;
            // hook the old list to the last list
            prevNode->setNext(nextNode);
            nextNode->setPrev(prevNode);
            return;
        }
        currentNode = nextNode;
    }
    cout << "looks like " << val << " was not in the list" << endl;
}

It looks complicated, but it is commented to help show what each section does, and has a few output lines so you can see how it works.