..  _linked-lists:

Linked Lists
############

..  include::   /header.inc
..  vim:filetype=rst spell:

As we have seen in our work so far, building flexible places to store program
data can get complicated. With the introduction of pointers, we have a way to
ask the system for a chunk of memory while the program is running, and use that
memory to store things. Obviously, that chunk of memory will not have a name,
but it will have an address, and we can store that in a pointer variable! This
new concept is called "dynamic storage".

By introducing the concept of dynamic data storage, we at last have the ability
to build containers as the program is running, freeing us from compile time
constraints. 

What is a linked list?
**********************

Suppose we build a container to hold a single piece of data, like a simple
integer. Further, suppose we need to store a lot of these integers while our
program runs, and we will not know how many until the program runs! What kind
of data container would you build?

The first thought is probably just a big array. That may work, unless the
number of integers gets bigger than you set the array up to hold!

Now suppose I want to store another piece of information along with that
integer, maybe even a lot of pieces of information - would an array still work?

Sure, we just build an object that holds all the required data and create an
array of these objects. No problem!

Now, suppose we need to rearrange this data many times as the program runs. The
code we have studied so far will require copying data from one place to another
in the array to get this job done. Could we avoid all of the copying?

Yes! We can build the data containers for each object dynamically, and let each
container hold a pointer to the next object in the list of containers we
create.

A Linked List is just a sequence of objects connected by pointers that are
stored within each object. These pointers point to the next object in the list.
The list has a sequential nature to it. We will need one pointer variable that
holds the address of the first object in the list, and we can use that pointer
to find that object. After we find the first object, we can find the next
object by looking inside the first object for the pointer that is tied to the
next object. When we get to the last object in the list, this internal pointer
will be set to nullptr meaning there is no next object!

Here is what it looks like:

..  image:: images/LinkedList.png
    :align: center


With a bit of careful coding, we can make this data structure grow and shrink
as needed while our program runs.

Advantages of linked lists
**************************

There are a number of reasons why programmers like linked lists:

    * No set limit on how big (or small) they can be
    * Easy (well not exactly hard) to add items to the list (even in the middle)
    * Easy (well, again not that hard) to delete items from the list

Disadvantages of linked lists
*****************************

However, there are some drawbacks

    * Not as easy to get to a particular item in the list (no index)
    * Pointer management is difficult to do right!

In spite of these problems, you do need to know how to build data structures
like this, so let's give it a try:

Building a Linked List
**********************

Let's start off on our exploration of linked lists by building a simple
application that manages a set our names we get from the user. Here is a main
program to get us started:


:main.cpp:

..  literalinclude:: code/ex1/main.cpp
    :linenos:
    :language: c


This simple code prompts the user for a name then loops until the entered name
is "quit". Here is this code in action:

Here is a sample run:

..  code-block:: text

    Enter name ('quit' to end):fred
    inserting fred
    Enter name ('quit' to end):barney
    inserting barney
    Enter name ('quit' to end):betty
    inserting betty
    Enter name ('quit' to end):wilma
    inserting wilma
    Enter name ('quit' to end):quit
    Press any key to continue . . .

Building nodes
**************

Now that we have a working program, we need to build a container to hold the
names we enter. What kind of objects will we need?

Let's look at some terminology first:

    * A Node is a basic container with data and a link to another node in it
    * A Link is just a pointer to a node.
    * A Head pointer points to the first node in the list

The node is the most important object in this list gadget. We need to create a
node for each piece of data we want to place in our linked list. So, one class
should set up nodes. We also need another object to manage the list. This is
exactly like the problem we faced in the poolball problem. We used one class to
act like a poolball, and another class to manage the collection of poolballs.
So, we will have a second class to manage the entire list.

We will need to build these two classes to store a linked list of names:

Node Class
==========

..  literalinclude::    code/ex2/Node.h
    :linenos:
    :language: c

Node objects will store a single string for the name, and a single pointer to
the next node in the list. The class also includes two constructors, one for a
simple empty node, and one that will be used to initialize the instance
variables.

Building the Linked List class
******************************

Now that we have a node defined, we can set up the management class:

Linked List class
=================

..  literalinclude::    code/ex2/LinkedList.h
    :linenos:
    :language: c

We need a pointer to keep track of our series of nodes, and we use the head
pointer to do just that. We have provided a constructor which creates a new
(empty) linked list, and methods to add items to the list, and print the list
out. We need to write these last two methods before we can run our program and
store the names.

Inserting names in the list
===========================

The first of these routines will add a new name to the front of the current
list:

:Insert method:

..  literalinclude::    code/ex2/LinkedList.cpp
    :linenos:
    :language: c
    :lines: 8-10

What is going on here? The ``insert`` method is supposed to add the new name to
the front of the list. Before we can do that, we need a place to store the new
name.

Creating Dynamic Storage
------------------------

The C++ operator ``new`` does the job of creating new memory space for us. The
operator is followed by the name of an existing data type, or the name of a
class you have created. The memory needed to store one object from the
designated data type (or class) is set up in a memory area called the ``heap``,
and you get the address of that location back. All you need to do is store that
address somewhere. 

..  code-block:: c

    Node * tmp;
    tmp = new Node();   // creates one bare (uninitialized) Node object

Deleting Dynamic Storage
------------------------

If you need to get rid of a chunk of memory you previously created with
``new``, you use the ``delete`` operator:

..  code-block:: c

    delete tmp;         // returns the storage allocated to the heap    

Storing the Address
-------------------

Line 9 in the ``insert`` method above actually stores the address, and does so
in a safe way. We took the current pointer to the head of the list and passed
it into the Node constructor so that when the ``new`` operator did its magic,
the new node was properly initialized with the old list pointer stored inside.
At that point taking the new address ``new`` gives us and storing that into our
``head`` variable reconnects ``head`` to the front of the new list. 

You need to be careful in these kinds of operations, or things can go wrong
quickly!

What would happen if we used the following code:

..  code-block:: c

    Node * hook;
    hook = new Node();
    head = hook;
    std::cout << head->name << std::endl;

Remember the rope analogy? In the third line above, you cut off the old rope,
and lost whatever list you used to have connected there. One solution to that
is to make sure you have another pointer variable pointing to the old list when
you replace the address in ``head`` with a new address.

..  code-block:: c

    Noder * hook;
    Node * save = head;
    hook = new Node();
    save = head;
    head = hook;
    head->next = save;

Accidentally losing the address of some data structure constructed in dynamic
memory is a common mistake, and one you need to think about as you work with
pointers!  Draw a picture and make sure you see how this worked!  What happens
if the list is empty, or if it has other names in the list?

Printing out the list
*********************

The pattern you will see in this routine is very common in linked data
structures.:

..  literalinclude::    code/ex2/LinkedList.cpp
    :linenos:
    :language: c
    :lines: 12-18

Remember that the funny "->" notation means "follow the link, then look inside
the box for this inner container". 

With these two routines in place, we can set up our program:

:main.cpp:

..  literalinclude:: code/ex2/main.cpp
    :linenos:
    :caption: main.cpp


We have created a ``LinkedList`` variable to keep track of the new list. The
constructor for this class makes sure this new variable is pointing to an empty
list. Next, we read the names from the user as before, but now we insert them
into the linked list. If all goes well, we print the list out. This is what you
should see:

sample run
==========

..  code-block:: text

    Enter name ('quit' to end):fred
    inserting fred
    Enter name ('quit' to end):wilma
    inserting wilma
    Enter name ('quit' to end):barney
    inserting barney
    Enter name ('quit' to end):betty
    inserting betty
    Enter name ('quit' to end):quit
    betty
    barney
    wilma
    fred
    Press any key to continue . . .


Notice the order we see here. Since the insert method always inserts at the
front of the list, we will see the items last to first.

Hiding Details
**************

There is one significant problem with the code as we have shown it so far. All
of the internal details about how things are working are visible to users, and
they can (and will) break things if they misuse these variables. It would be
better to hide the details behind the ``private`` barrier!

Here is our new ``main.cpp``:

..  literalinclude::    code/ex3/main.cpp
    :linenos: 
    :language: c

Nere are the new class header files needed to implement these changes

..  literalinclude::    code/ex3/Node.h
    :linenos:
    :language: c

And the Management class header

..  literalinclude::    code/ex3/LinkedList.h
    :linenos:
    :language: c

Creating the implementation files is an easy change, so that will not be shown here


Creating an ordered list
************************

Could we make a smarter insert method, like one that keeps the names in
alphabetical order? Let's see:

Alphabetic insert
=================

..  code-block:: c

    void LinkedList::alphaInsert(string val) {
    }

Now we need to work out how to insert a node at a particular place. This is a
bit complicated, so we will work it out in small steps!

What are the possibilities we might need to consider?

Empty list
----------

First, the list might be empty. This one is easy, we insert the name into the
empty list. We can tell if it is empty by checking the value of the head node:

add to an empty list
--------------------

..  code-block:: c

    if (head == nullptr) insert(val);

Item goes in front
++++++++++++++++++

Next, the item could go in front of the first item. Again, we can use the
insert method to do this. We can combine these two cases to get:

add to front of a possibly empty list
-------------------------------------

..  code-block:: c


    if (head == nullptr || val < head->getName()) insert(val)

Make sure you see how this expression works. Remember that C++ evaluates any
logical expression as far as necessary to get the answer and no further. So, if
the head pointer is nullptr, the first part of the expression will return a true
value, and we know that the entire expression is true without evaluating the
rest of the expression. In this case, this is good, since evaluating the second
part would generate a reference through a nullptr pointer and crash!

This is called a short circuit expression, we stop evaluating (short circuit
-er- cause the evaluator to stop) before we get into trouble. Again, this is a
common pattern in working with pointers.

General case
++++++++++++

Now, we need to work out the hard part. In general, the item will need to go
after one item and in front of a second item. We will make sure the item does
not go in front of the first node by dealing with that possibility before we
get to this part of the code. Since we will need to hang onto multiple nodes
here, we will need more than one pointer variable. Let's set up two, one for
the current node, and one for the next node. Here is a loop that can work
through all the nodes in the list keeping these two pointers set right:

Walking through the list with two pointers
------------------------------------------

..  code-block:: c

    Node *currentNode, *nextNode;

    currentNode = head;
    while (currentNode != nullptr) {
        nextNode = currentNode->getNext()
        ...
        currentNode = nextNode;
    }

Do you see how it works? We make sure the current node is not ``nullptr`` as the loop
control - stopping after we try to go past the last node in the list. We set
the ``nextNode`` pointer to the value in the next variable. This will be the next
node in the list, or ``nullptr`` when the ``currentNode`` is the last item in the list.

Now, inside the loop, we have our two pointers. However, we have two
possibilities to consider. The general case happens when both pointers are not
i``nullptr``, meaning we have two nodes in the list. We need to check to see if the new
name goes between these two nodes in this case. 

In the second case, the ``nextNode`` pointer is ``nullptr``, meaning we are at the end of
the list. We will hook the new name after this node in this case.

End of list case
================


..  code-block  c

    if (nextNode == nullptr) {
        Node *newNode = new Node(val,nullptr);
        currentNode->setNext(newNode);
        return;
    }

Here, we create our new node and hook it to the end of the list. We make sure
that the list still has a proper nullptr at the end by setting this up in the call
to the node constructor. Notice how we bail out of the routine when we find the
place for the new node. We will use that scheme in the rest of the code as
well.

We can shorten this up like so:

End of list case
================

..  code-block:: c

    if (nextNode == nullptr) {
        currentNode->setNext(new Node(val,nullptr));
        return;

This avoids needing another variable and still keeps the list in proper shape.

At last, we get to the general case. Here we will hook the new node in between
the two other nodes all the while making sure we do not let go of any of the
rope!

general case
============

..  code-block c

    if (val < nextNode->getName()) {
        Node *newNode = new Node(val,nextNode);
        currentNode->setNext(newNode);
    }

This code checks to see if the name goes in front of the next node, and inserts
it between the two nodes if so.

Here is the complete routine:

:LinkedList.cpp:

..  literalinclude::    code/ex4/LinkedList.cpp
    :linenos:
    :language: c

And this is how it works:

sample run
==========

..  code-block:: text


    Enter name ('quit' to end):fred
    inserting fred
    Enter name ('quit' to end):barney
    inserting barney
    Enter name ('quit' to end):wilma
    inserting wilma
    Enter name ('quit' to end):betty
    inserting betty
    Enter name ('quit' to end):quit
    barney
    betty
    fred
    wilma
    Press any key to continue . . .

While this one was much harder to implement, it demonstrates the kind of
thinking you need to do to keep these linked data structures under control.

Removing a name from the list
*****************************

Now, we can deal with the removal of an item from the list. The code for this
routine needs to find the item (which may not be in the list) and then unhook
it. Once again, we have a number of cases to consider.

Remove an item from the list
****************************

To make our ``LinkedList`` class more useful, we need to add a method that removes
items from the list. Here is the new prototype:

..  literalinclude::    code/ex5/LinkedList.h
    :lines: 19
    :language: c


We have several situations to consider in writing this method's implementation.
Here is the outer structure:

..  literalinclude::    code/ex5/LinkedList.cpp
    :linenos:
    :lines: 48-50,78
    :language: c

The list is empty
=================

If the list is empty, we have nothing to do:

..  literalinclude::    code/ex5/LinkedList.cpp
    :linenos:
    :lines: 51-52
    :language: c

Item is in front
================

If the item is in the front of the list, the procedure is pretty simple again:

..  literalinclude::    code/ex5/LinkedList.cpp
    :linenos:
    :lines: 54-60
    :language: c

General case
============

Once again, we need to set up a loop that walks down the list finding the item
in question. We want to keep track of the previous node, so we search for a
match using the nextNode pointer:

..  literalinclude::    code/ex5/LinkedList.cpp
    :linenos:
    :lines: 62-80
    :language: c

We have a problem here. As with the alphabetic insert logic we discussed above,
we might have reached the end of the list when we check for the name in the
next node. In that case, ``nextNode`` will be nullptr. If it is, that means we have
reached the end of the line and the name we are after must not be in the list -
so we have nothing to do. We can check for this before we compare strings

Here is the entire implementation file:

..  literalinclude::    code/ex5/LinkedList.cpp
    :linenos:

Here is a modification to our main code to test this:

..  literalinclude::    code/ex5/main.cpp

And here is what we get running this code:

sample run
==========

..  code-block:: text

    $ ./ex5
    Enter name ('quit' to end):fred
    Enter name ('quit' to end):wilma
    Enter name ('quit' to end):barney
    Enter name ('quit' to end):dino
    Enter name ('quit' to end):quit
    barney
    dino
    fred
    wilma
    removing fred from the list
    barney
    dino
    wilma
    name linda is not in the list
    barney
    dino
    wilma
    removing barney from list front
    dino
    wilma
    removing wilma from the list
    dino

Now, we are making progress. We have a set of routines that will let us build a
list of names, add new names and delete old names. Pretty good. But, can we do
better?  

Doubly linked lists
*******************

In the previous example, our linked list is one way. That means we can only
walk the list in one direction, from the first node to the last node. What if
we want to be able to walk both ways? Can we do that?

Sure, just maintain two pointers in each node - one to hook to the node after
the current node, and another to hook to the node before the current node. Our
class could handle the list this way with the user of our linked list not even
knowing we had new features in the class.

Here is the new Node class:


:Node.h:

..  literalinclude::    code/ex6/Node.h
    :linenos:
    :language: c

..  note::

    Since the code for the methods in the Node class was so simple, it is
    common to just add that code into the header file and eliminate the
    implementation file. We are doing this here, so delete the ``Node.cpp`` file if
    it exists. Remember to update your Makefile if you do this.

And our new linked list class:

:LinkedList.h:

..  literalinclude::    code/ex6/LinkedList.h
    :linenos:
    :language: c

This definition did not change much. The class will keep two pointers to the
list, one pointing to the front of the list, and one pointing to the end of the
list. Both of these pointers will be nullptr if the list is empty.

There are a few new routines here so we can play with this new kind of linked
list.  

Modifications to the LinkedList methods
=======================================

We have no other changes to make to the Node class, but our methods in the
``LinkedList`` class need to be worked on.

Constructor
-----------

The basic constructor is not changed, except for the need to initialize the new
tail pointer.

..  code-block:: c

    LinkedList::LinkedList() {name = ""; head = nullptr; tail = nullptr; }

Insert at front of the list
+++++++++++++++++++++++++++

With the addition of the new pointer, inserting at the front is a bit more
complicated. Here, we need to consider a few more cases: 

:insert in the front of the list:

..  code-block:: c

    void LinkedList::insertFront(string val) {
        if (head == nullptr) {
            head = new Node(val,nullptr,nullptr);  
            tail = head;
        } else {
            Node * ptr = new Node(val,head,nullptr);
            head->setPrev(ptr);
            head = ptr;
        }
    }

You should draw a few pictures to make sure you can follow what is going on
here. We are trying to maintain two chains here. So we have to make sure both
are consistent when we get done. Each case takes thinking to make sure you have
it right! Don't get discouraged if you break a few lists as you learn - we all
have!  

Remove item
===========

This method is another that gets a bit more complicated. Again, we need to
carefully think through all the possibilities. Let's try to write them down
before we build the code:

    * List is empty - no work to do
    * List has one node
    * Item is not in the list - no work to do
    * Item is in the list - delete the item and make the list empty
    * List has two or more nodes
    * Item is not in the list - no work
    * Item is in the list
    * Item is in front of the list
    * Item is at the end of the list
    * Item is in the middle of the list

Each case needs to be thought through individually.

Here is the final remove routine:

..  literalinclude::    code/ex6/LinkedList.cpp
    :linenos:
    :lines: 80-134

It looks complicated, but it is commented to help show what each section does,
and has a few output lines so you can see how it works.

Let's see this code in action:

..  command-output::    make clean
    :cwd: code/ex6

..  command-output::    make
    :cwd: code/ex6

..  command-output::    make run
    :cwd: code/ex6