Asymptotic analysis of a few linked list variations
While the concept of a linked list is straightforward, it's worth noting that there are a few variations on the concept. None is indisputably better than the others; instead, they each present different tradeoffs, with some using additional memory and complexity in order to guarantee that certain operations run faster. So when we want to implement a linked list, we have a decision to make: Which variation should we implement? In order to answer that question, we first have to understand what the tradeoffs are; it's time to do asymptotic analysis on each variation, so we can understand the situations in which each one excels.
Singly-linked list with head
The simplest variation of linked list is what you might call a singly-linked list with head. Its name implies its two most important qualities:
- The list is singly-linked, meaning that every node, in addition to storing one data element, "links" to a single node, the one that follows it in the list.
- The only node to which there is a direct pointer is the head (i.e., the first one). To reach any other node, you get there by working your way down the list a node at a time, since each node can tell you where the next one is.
A singly-linked list with head looks something like this:
Now, let's do some asymptotic analysis of a set of operations you might like to perform on a singly-linked list with head, so we can see where it excels and where it falters. We'll assume that the list contains n elements at the time the operation is performed.
- Add an element to the front of the list. This requires only four steps: (1) create a new node, (2) point the new node's next to where head points, (3) fill in the node's data, (4) point head to the new node. It makes no difference how many nodes there are; it's always the same four steps. So we would say that this operation would take Θ(1) time to run.
- Remove the first element from the list. This requires only a constant number of steps (though some of the details differ a bit from one programming language to another): (1) point head to the second node (i.e., the first node's "next"), (2) destroy the node containing the first element. Again, it makes no difference how many nodes there are, since only the first two nodes are touched, so this also takes Θ(1) time.
- Get the first element in the list. No matter how long the list is, we simply need to follow the head pointer to the node it points to, then fetch the data from that node. This operation takes Θ(1) time.
- Get the ith element in the list. Finding an element in a known location in the list still requires obtaining a pointer to the corresponding node. But the only node to which we have a direct link is the first one; to get to any other node, we have to walk down the list, one node at a time. So, getting the ith element in the list requires us to take i steps through the list. This operation takes Θ(i) time. Since any valid index i < n, we could also say that it takes O(n) time, though this would be a somewhat weaker statement to make, albeit a true one. (This is our first example of where there are multiple variables in play. Choosing the "right" one to use is a matter of what argument you're trying to make. In this case, though, Θ(i) is a stronger statement because it cuts right to the core of the issue: The length of time it takes to get a particular element in the list is a function of how far down the list you have to go to get there.)
- Add an element to the end of the list. To add a new node to the end of the list, we first have to get to the end of the list. Once we have a pointer to the last node, we can then: (1) create a new node, (2) point the formerly last node's next to the new node, (3) fill in the new node's data. But even though there are a constant number of steps required to add the new node once we've gotten to the last one, getting there requires us to walk down the whole list, so this, in total, takes Θ(n) time.
- Remove an element from the end of the list. Removing the last node of the list, similarly, requires getting to the end of the list, though it's important to note that we actually need to obtain a pointer to the second-to-last node (because this is the one whose "next" field has to change). In general, finding the second-to-last node is still a linear function of how long the list is, so this takes Θ(n) time.
We see, generally, that the head pointer can allow us to very quickly manipulate the front of the list, but that it is much more expensive to manipulate the end. One way to solve that problem might be to keep track of where the end is at all times.
Singly-linked list with head and tail
A singly-linked list with head and tail is similar to a singly-linked list with head, and its name again focuses on its most significant qualities:
- The list is singly-linked, meaning that every node, in addition to the storing one data element, "links" to a single node, the one that follows it in the list.
- There are now two nodes to which there are direct pointers: the head (i.e., the first one) and the tail (i.e., the last one). To reach any other node, you get there by working your way down the list one node at a time, starting with the head.
Pictorially, a singly-linked list with head and tail looks like this:
To compare this variation to the singly-linked list with head, we can analyze the same set of operations we did before, and see which ones (if any) got better and which (if any) got worse.
- Add an element to the front of the list, Remove the first element from the list, and Get the first element in the list. None of these three operations changes significantly. The only difference is that there's one extra step, which is to modify the tail pointer in the cases where we're adding to an empty list or removing the only element from a one-element list. Otherwise, the operations are identical to the same ones in a singly-linked list with head, and all of them will still run in Θ(1) time.
- Get the ith element in the list. Nothing changes substantively here, either. Having a tail pointer gives us no real advantage, because it can only allow us to reach the last node — the last node's "next" pointer will always point to null, since no node will ever follow it (or it wouldn't be the last one!). So the analysis here is the same, as well: Θ(i) is probably the strongest statement we can make, with O(n) being a somewhat weaker one.
- Add an element to the end of the list. This is where a singly-linked list with head and tail really shines. The first step in adding an element to the end of the list is getting there; if we have a tail pointer, we can get there in a constant amount of time, no matter how long the list is. From there, creating a node and adjusting the links — including pointing the tail pointer to the newly-added node — is also a constant number of steps. So, in total, this now runs in Θ(1) time.
- Remove an element from the end of the list. It would seem that this would be another operation where a tail pointer would confer a benefit. But we have to think carefully about this. The first step in removing the last element from the list is getting to the second-to-last node, since this is one whose "next" pointer will need to change. A tail pointer doesn't help with that; if you follow the tail pointer, you've gone too far, and there's no way to get back! So this operation will still require walking all the way down the list, and will still take Θ(n) time.
So, all in all, not very much changed. One operation — adding to the end of the list — got substantially better, and nothing really got any worse. But it's a surprisingly small benefit for what seemed like it might be a really worthwhile change. And, in truth, the change is worthwhile if you know that adding to the end is one of the key operations you need. But, generally, this leaves us feeling a bit underwhelmed. And it should be noted that we gave something up in the deal: Some of the constant-time operations got a bit slower (because of the need to manipulate the tail pointer) and a bit more complex (for the same reason).
In general, we wouldn't expect anything to really help with the Get the ith element in the list operation. A fundamental quality of a linked list is that you have to walk from one node to the next, because there's no rhyme or reason to where the nodes are located in memory; just because you know you want the 10th node doesn't give you any insight into where it will be. But, at the very least, it seems like the Remove an element from the end of the list operation could be better. And there's one important tweak that can get us there.
Doubly-linked list with head and tail
A doubly-linked list with head and tail is a slight rethinking of what we've seen so far; in particular, it adds the notion of a second link between nodes.
- The list is doubly-linked, meaning that every node can tell you where two other nodes are: the one following it and the one preceding it.
- Direct pointers to both the head (i.e., the first node) and the tail (i.e., the last node) are provided.
We might draw a doubly-linked list with head and tail this way:
Once again, we'll do an asymptotic analysis of the same set of operations, so we can understand what this new idea buys us (if anything).
- Add an element to the front of the list, Remove the first element from the list, and Get the first element in the list. Once again, there is no significant change in any of these operations. There is a difference: the necessity of updating "previous" pointers in each node as we make changes to them. But this adds only a constant number of additional steps, so it doesn't affect the growth rate. All of these operations will still run in Θ(1) time.
- Get the ith element in the list. There is a slight change here, actually, if we also were to keep track of the size of the list separately. If we knew how many nodes there were, we would know whether the ith element was closer to the front or to the end. Since every node links us in either direction, this would allow us to walk to the appropriate either by starting from the beginning and walking forward, or by starting from the end and walking backward, whichever is more advantageous. So, in total, the time spent will be determined by whichever takes less time. We could encapsulate all of that nicely by saying that this operation now takes Θ(min(i, n − i)) time, which is to say the minimum amount of time required to walk forward from the beginning or backward from the end.
- Add an element to the end of the list. Just like a singly-linked list with head and tail, the presence of a tail pointer solves the hard part quickly: getting to the end of the list. There is an extra step (initializing a "previous" pointer in the new node), but this doesn't change the overall analysis; the operation still runs in Θ(1) time.
- Remove an element from the end of the list. This is where one of the big wins arises when we make the list doubly-linked. The first step in removing the last element is to obtain a pointer to the second-to-last node; we can now do this in exactly two steps, by following the tail pointer to the last node, then following its "previous" pointer to the second-to-last node. From there, a constant number of steps would remove the last node and "delink" it. So, now, this operation runs in Θ(1) time.
It's important to note that, while we got some benefit, we did make a couple of addditional tradeoffs. We've spent some extra memory here, because every node is going to need to store an additional pointer; that may or may not be significant, depending on how much memory is available to us, but is something to consider. We've also made the various constant-time operations a little bit slower, because there's an additional pointer to check or manipulate. And, finally, we've made our linked list implementation even more complex than it was before.
Choosing the appropriate variation, in practice, is going to be a matter of understanding the actual problem we're trying to solve, determining which operations we're going to need in order to solve it, and selecting the simplest variation that performs well enough in the ways that we need it to. We'll see a few practical examples of this soon.
Circular variants
For each of the above variants of linked lists, another option involves what you might call circularity. A circular linked list is one in which the end connects back to the beginning — and, in the case of a doubly-linked circular list, the beginning connects back to the end. In other words, in a circular linked list, the next pointer in the last node points to the first node (i.e., the same node the head pointer points to); if there is a previous pointer in each node, the previous pointer of the first node points to the last node.
This doesn't make an appreciable difference in the analyses we did above, but does allow some lower-level implementation gains, such as the ability to never go "too far" in a singly-linked list (i.e., starting from anywhere, you can get anywhere) and the ability to traverse the same list multiple times. Those can be small-time wins, or they can form the basis of a superior implementation for a more complex data structure; more on that later.