.. _linked-lists: Linked Lists ############ .. include:: /header.inc .. vim:filetype=rst spell: As we have seen in our work so far, building flexible places to store program data can get complicated. With the introduction of pointers, we have a way to ask the system for a chunk of memory while the program is running, and use that memory to store things. Obviously, that chunk of memory will not have a name, but it will have an address, and we can store that in a pointer variable! This new concept is called "dynamic storage". By introducing the concept of dynamic data storage, we at last have the ability to build containers as the program is running, freeing us from compile time constraints. What is a linked list? ********************** Suppose we build a container to hold a single piece of data, like a simple integer. Further, suppose we need to store a lot of these integers while our program runs, and we will not know how many until the program runs! What kind of data container would you build? The first thought is probably just a big array. That may work, unless the number of integers gets bigger than you set the array up to hold! Now suppose I want to store another piece of information along with that integer, maybe even a lot of pieces of information - would an array still work? Sure, we just build an object that holds all the required data and create an array of these objects. No problem! Now, suppose we need to rearrange this data many times as the program runs. The code we have studied so far will require copying data from one place to another in the array to get this job done. Could we avoid all of the copying? Yes! We can build the data containers for each object dynamically, and let each container hold a pointer to the next object in the list of containers we create. A Linked List is just a sequence of objects connected by pointers that are stored within each object. These pointers point to the next object in the list. The list has a sequential nature to it. We will need one pointer variable that holds the address of the first object in the list, and we can use that pointer to find that object. After we find the first object, we can find the next object by looking inside the first object for the pointer that is tied to the next object. When we get to the last object in the list, this internal pointer will be set to nullptr meaning there is no next object! Here is what it looks like: .. image:: images/LinkedList.png :align: center With a bit of careful coding, we can make this data structure grow and shrink as needed while our program runs. Advantages of linked lists ************************** There are a number of reasons why programmers like linked lists: * No set limit on how big (or small) they can be * Easy (well not exactly hard) to add items to the list (even in the middle) * Easy (well, again not that hard) to delete items from the list Disadvantages of linked lists ***************************** However, there are some drawbacks * Not as easy to get to a particular item in the list (no index) * Pointer management is difficult to do right! In spite of these problems, you do need to know how to build data structures like this, so let's give it a try: Building a Linked List ********************** Let's start off on our exploration of linked lists by building a simple application that manages a set our names we get from the user. Here is a main program to get us started: :main.cpp: .. literalinclude:: code/ex1/main.cpp :linenos: :language: c This simple code prompts the user for a name then loops until the entered name is "quit". Here is this code in action: Here is a sample run: .. code-block:: text Enter name ('quit' to end):fred inserting fred Enter name ('quit' to end):barney inserting barney Enter name ('quit' to end):betty inserting betty Enter name ('quit' to end):wilma inserting wilma Enter name ('quit' to end):quit Press any key to continue . . . Building nodes ************** Now that we have a working program, we need to build a container to hold the names we enter. What kind of objects will we need? Let's look at some terminology first: * A Node is a basic container with data and a link to another node in it * A Link is just a pointer to a node. * A Head pointer points to the first node in the list The node is the most important object in this list gadget. We need to create a node for each piece of data we want to place in our linked list. So, one class should set up nodes. We also need another object to manage the list. This is exactly like the problem we faced in the poolball problem. We used one class to act like a poolball, and another class to manage the collection of poolballs. So, we will have a second class to manage the entire list. We will need to build these two classes to store a linked list of names: Node Class ========== .. literalinclude:: code/ex2/Node.h :linenos: :language: c Node objects will store a single string for the name, and a single pointer to the next node in the list. The class also includes two constructors, one for a simple empty node, and one that will be used to initialize the instance variables. Building the Linked List class ****************************** Now that we have a node defined, we can set up the management class: Linked List class ================= .. literalinclude:: code/ex2/LinkedList.h :linenos: :language: c We need a pointer to keep track of our series of nodes, and we use the head pointer to do just that. We have provided a constructor which creates a new (empty) linked list, and methods to add items to the list, and print the list out. We need to write these last two methods before we can run our program and store the names. Inserting names in the list =========================== The first of these routines will add a new name to the front of the current list: :Insert method: .. literalinclude:: code/ex2/LinkedList.cpp :linenos: :language: c :lines: 8-10 What is going on here? The ``insert`` method is supposed to add the new name to the front of the list. Before we can do that, we need a place to store the new name. Creating Dynamic Storage ------------------------ The C++ operator ``new`` does the job of creating new memory space for us. The operator is followed by the name of an existing data type, or the name of a class you have created. The memory needed to store one object from the designated data type (or class) is set up in a memory area called the ``heap``, and you get the address of that location back. All you need to do is store that address somewhere. .. code-block:: c Node * tmp; tmp = new Node(); // creates one bare (uninitialized) Node object Deleting Dynamic Storage ------------------------ If you need to get rid of a chunk of memory you previously created with ``new``, you use the ``delete`` operator: .. code-block:: c delete tmp; // returns the storage allocated to the heap Storing the Address ------------------- Line 9 in the ``insert`` method above actually stores the address, and does so in a safe way. We took the current pointer to the head of the list and passed it into the Node constructor so that when the ``new`` operator did its magic, the new node was properly initialized with the old list pointer stored inside. At that point taking the new address ``new`` gives us and storing that into our ``head`` variable reconnects ``head`` to the front of the new list. You need to be careful in these kinds of operations, or things can go wrong quickly! What would happen if we used the following code: .. code-block:: c Node * hook; hook = new Node(); head = hook; std::cout << head->name << std::endl; Remember the rope analogy? In the third line above, you cut off the old rope, and lost whatever list you used to have connected there. One solution to that is to make sure you have another pointer variable pointing to the old list when you replace the address in ``head`` with a new address. .. code-block:: c Noder * hook; Node * save = head; hook = new Node(); save = head; head = hook; head->next = save; Accidentally losing the address of some data structure constructed in dynamic memory is a common mistake, and one you need to think about as you work with pointers! Draw a picture and make sure you see how this worked! What happens if the list is empty, or if it has other names in the list? Printing out the list ********************* The pattern you will see in this routine is very common in linked data structures.: .. literalinclude:: code/ex2/LinkedList.cpp :linenos: :language: c :lines: 12-18 Remember that the funny "->" notation means "follow the link, then look inside the box for this inner container". With these two routines in place, we can set up our program: :main.cpp: .. literalinclude:: code/ex2/main.cpp :linenos: :caption: main.cpp We have created a ``LinkedList`` variable to keep track of the new list. The constructor for this class makes sure this new variable is pointing to an empty list. Next, we read the names from the user as before, but now we insert them into the linked list. If all goes well, we print the list out. This is what you should see: sample run ========== .. code-block:: text Enter name ('quit' to end):fred inserting fred Enter name ('quit' to end):wilma inserting wilma Enter name ('quit' to end):barney inserting barney Enter name ('quit' to end):betty inserting betty Enter name ('quit' to end):quit betty barney wilma fred Press any key to continue . . . Notice the order we see here. Since the insert method always inserts at the front of the list, we will see the items last to first. Hiding Details ************** There is one significant problem with the code as we have shown it so far. All of the internal details about how things are working are visible to users, and they can (and will) break things if they misuse these variables. It would be better to hide the details behind the ``private`` barrier! Here is our new ``main.cpp``: .. literalinclude:: code/ex3/main.cpp :linenos: :language: c Nere are the new class header files needed to implement these changes .. literalinclude:: code/ex3/Node.h :linenos: :language: c And the Management class header .. literalinclude:: code/ex3/LinkedList.h :linenos: :language: c Creating the implementation files is an easy change, so that will not be shown here Creating an ordered list ************************ Could we make a smarter insert method, like one that keeps the names in alphabetical order? Let's see: Alphabetic insert ================= .. code-block:: c void LinkedList::alphaInsert(string val) { } Now we need to work out how to insert a node at a particular place. This is a bit complicated, so we will work it out in small steps! What are the possibilities we might need to consider? Empty list ---------- First, the list might be empty. This one is easy, we insert the name into the empty list. We can tell if it is empty by checking the value of the head node: add to an empty list -------------------- .. code-block:: c if (head == nullptr) insert(val); Item goes in front ++++++++++++++++++ Next, the item could go in front of the first item. Again, we can use the insert method to do this. We can combine these two cases to get: add to front of a possibly empty list ------------------------------------- .. code-block:: c if (head == nullptr || val < head->getName()) insert(val) Make sure you see how this expression works. Remember that C++ evaluates any logical expression as far as necessary to get the answer and no further. So, if the head pointer is nullptr, the first part of the expression will return a true value, and we know that the entire expression is true without evaluating the rest of the expression. In this case, this is good, since evaluating the second part would generate a reference through a nullptr pointer and crash! This is called a short circuit expression, we stop evaluating (short circuit -er- cause the evaluator to stop) before we get into trouble. Again, this is a common pattern in working with pointers. General case ++++++++++++ Now, we need to work out the hard part. In general, the item will need to go after one item and in front of a second item. We will make sure the item does not go in front of the first node by dealing with that possibility before we get to this part of the code. Since we will need to hang onto multiple nodes here, we will need more than one pointer variable. Let's set up two, one for the current node, and one for the next node. Here is a loop that can work through all the nodes in the list keeping these two pointers set right: Walking through the list with two pointers ------------------------------------------ .. code-block:: c Node *currentNode, *nextNode; currentNode = head; while (currentNode != nullptr) { nextNode = currentNode->getNext() ... currentNode = nextNode; } Do you see how it works? We make sure the current node is not ``nullptr`` as the loop control - stopping after we try to go past the last node in the list. We set the ``nextNode`` pointer to the value in the next variable. This will be the next node in the list, or ``nullptr`` when the ``currentNode`` is the last item in the list. Now, inside the loop, we have our two pointers. However, we have two possibilities to consider. The general case happens when both pointers are not i``nullptr``, meaning we have two nodes in the list. We need to check to see if the new name goes between these two nodes in this case. In the second case, the ``nextNode`` pointer is ``nullptr``, meaning we are at the end of the list. We will hook the new name after this node in this case. End of list case ================ .. code-block c if (nextNode == nullptr) { Node *newNode = new Node(val,nullptr); currentNode->setNext(newNode); return; } Here, we create our new node and hook it to the end of the list. We make sure that the list still has a proper nullptr at the end by setting this up in the call to the node constructor. Notice how we bail out of the routine when we find the place for the new node. We will use that scheme in the rest of the code as well. We can shorten this up like so: End of list case ================ .. code-block:: c if (nextNode == nullptr) { currentNode->setNext(new Node(val,nullptr)); return; This avoids needing another variable and still keeps the list in proper shape. At last, we get to the general case. Here we will hook the new node in between the two other nodes all the while making sure we do not let go of any of the rope! general case ============ .. code-block c if (val < nextNode->getName()) { Node *newNode = new Node(val,nextNode); currentNode->setNext(newNode); } This code checks to see if the name goes in front of the next node, and inserts it between the two nodes if so. Here is the complete routine: :LinkedList.cpp: .. literalinclude:: code/ex4/LinkedList.cpp :linenos: :language: c And this is how it works: sample run ========== .. code-block:: text Enter name ('quit' to end):fred inserting fred Enter name ('quit' to end):barney inserting barney Enter name ('quit' to end):wilma inserting wilma Enter name ('quit' to end):betty inserting betty Enter name ('quit' to end):quit barney betty fred wilma Press any key to continue . . . While this one was much harder to implement, it demonstrates the kind of thinking you need to do to keep these linked data structures under control. Removing a name from the list ***************************** Now, we can deal with the removal of an item from the list. The code for this routine needs to find the item (which may not be in the list) and then unhook it. Once again, we have a number of cases to consider. Remove an item from the list **************************** To make our ``LinkedList`` class more useful, we need to add a method that removes items from the list. Here is the new prototype: .. literalinclude:: code/ex5/LinkedList.h :lines: 19 :language: c We have several situations to consider in writing this method's implementation. Here is the outer structure: .. literalinclude:: code/ex5/LinkedList.cpp :linenos: :lines: 48-50,78 :language: c The list is empty ================= If the list is empty, we have nothing to do: .. literalinclude:: code/ex5/LinkedList.cpp :linenos: :lines: 51-52 :language: c Item is in front ================ If the item is in the front of the list, the procedure is pretty simple again: .. literalinclude:: code/ex5/LinkedList.cpp :linenos: :lines: 54-60 :language: c General case ============ Once again, we need to set up a loop that walks down the list finding the item in question. We want to keep track of the previous node, so we search for a match using the nextNode pointer: .. literalinclude:: code/ex5/LinkedList.cpp :linenos: :lines: 62-80 :language: c We have a problem here. As with the alphabetic insert logic we discussed above, we might have reached the end of the list when we check for the name in the next node. In that case, ``nextNode`` will be nullptr. If it is, that means we have reached the end of the line and the name we are after must not be in the list - so we have nothing to do. We can check for this before we compare strings Here is the entire implementation file: .. literalinclude:: code/ex5/LinkedList.cpp :linenos: Here is a modification to our main code to test this: .. literalinclude:: code/ex5/main.cpp And here is what we get running this code: sample run ========== .. code-block:: text $ ./ex5 Enter name ('quit' to end):fred Enter name ('quit' to end):wilma Enter name ('quit' to end):barney Enter name ('quit' to end):dino Enter name ('quit' to end):quit barney dino fred wilma removing fred from the list barney dino wilma name linda is not in the list barney dino wilma removing barney from list front dino wilma removing wilma from the list dino Now, we are making progress. We have a set of routines that will let us build a list of names, add new names and delete old names. Pretty good. But, can we do better? Doubly linked lists ******************* In the previous example, our linked list is one way. That means we can only walk the list in one direction, from the first node to the last node. What if we want to be able to walk both ways? Can we do that? Sure, just maintain two pointers in each node - one to hook to the node after the current node, and another to hook to the node before the current node. Our class could handle the list this way with the user of our linked list not even knowing we had new features in the class. Here is the new Node class: :Node.h: .. literalinclude:: code/ex6/Node.h :linenos: :language: c .. note:: Since the code for the methods in the Node class was so simple, it is common to just add that code into the header file and eliminate the implementation file. We are doing this here, so delete the ``Node.cpp`` file if it exists. Remember to update your Makefile if you do this. And our new linked list class: :LinkedList.h: .. literalinclude:: code/ex6/LinkedList.h :linenos: :language: c This definition did not change much. The class will keep two pointers to the list, one pointing to the front of the list, and one pointing to the end of the list. Both of these pointers will be nullptr if the list is empty. There are a few new routines here so we can play with this new kind of linked list. Modifications to the LinkedList methods ======================================= We have no other changes to make to the Node class, but our methods in the ``LinkedList`` class need to be worked on. Constructor ----------- The basic constructor is not changed, except for the need to initialize the new tail pointer. .. code-block:: c LinkedList::LinkedList() {name = ""; head = nullptr; tail = nullptr; } Insert at front of the list +++++++++++++++++++++++++++ With the addition of the new pointer, inserting at the front is a bit more complicated. Here, we need to consider a few more cases: :insert in the front of the list: .. code-block:: c void LinkedList::insertFront(string val) { if (head == nullptr) { head = new Node(val,nullptr,nullptr); tail = head; } else { Node * ptr = new Node(val,head,nullptr); head->setPrev(ptr); head = ptr; } } You should draw a few pictures to make sure you can follow what is going on here. We are trying to maintain two chains here. So we have to make sure both are consistent when we get done. Each case takes thinking to make sure you have it right! Don't get discouraged if you break a few lists as you learn - we all have! Remove item =========== This method is another that gets a bit more complicated. Again, we need to carefully think through all the possibilities. Let's try to write them down before we build the code: * List is empty - no work to do * List has one node * Item is not in the list - no work to do * Item is in the list - delete the item and make the list empty * List has two or more nodes * Item is not in the list - no work * Item is in the list * Item is in front of the list * Item is at the end of the list * Item is in the middle of the list Each case needs to be thought through individually. Here is the final remove routine: .. literalinclude:: code/ex6/LinkedList.cpp :linenos: :lines: 80-134 It looks complicated, but it is commented to help show what each section does, and has a few output lines so you can see how it works. Let's see this code in action: .. command-output:: make clean :cwd: code/ex6 .. command-output:: make :cwd: code/ex6 .. command-output:: make run :cwd: code/ex6