Merge Sort

In some ways a singly-linked list is similar to a tape file. Each is a linear arrangement of data. If you want to read one, you have to start at the beginning and continue sequentially. It should come as no surprise, then, that a sorting algorithm originally designed for tapes can also work for linked lists.

The following description is based on a classical algorithm called the three tape sort (because you can use it with three tape drives). At first we will concentrate on explaining how it works. Then we will mention some simple tricks to make it more efficient.

Three Tape Sort

In the example, we start with a linked list of numbers, in no particular order: [unsorted list]
First we break up the list into two new lists. We simply march through the list, alternately transferring each node to one or the other of the new lists. Each of the new lists is still completely unsorted.

[two unsorted lists]
Now we merge the two lists, taking two nodes at a time, one from each list. We compare each pair of nodes and add them to a third list in the appropriate sequence. The resulting list is a series of two-node chunks, each of which is sorted:

[list sorted pairwise]
There is one node left over, without a partner. Such leftovers may occur at any stage, and the logic must accommodate them.

Now we divide this list back into two separate lists. Again we march though the list, alternately transferring nodes to one list or the other. This time, however, we transfer two nodes at a time, keeping the chunks which we created in the previous merge sort:

[two lists sorted pairwise]
Again we merge the two lists back into one. This time, instead of merging two nodes at a time, we merge two pairs of nodes, keeping each output foursome in order. It is important that the match-merge logic respect the boundaries between successive pairs. The resulting list is a series of four-node chunks, each one in order:

[list sorted by fours]
You can imagine the tape drives spinning as we continue with the next stage, casting alternate foursomes into one list or the other:

[two lists sorted by fours]
Again we merge the lists. This time we merge groups of four into groups of eight:

[list sorted by eights]
Once more we split into two lists:

[two lists sorted by eights]
After one more merge, our list is completely sorted:

[sorted list]
We repeat this cycle as many times as we need to. The number of passes is roughly the base-2 logarithm of the number of nodes. The length of each pass is equal to the number of nodes. Hence the running time is proportional to n * log n, where n is the number of nodes.

Optimizations

The above algorithm works without alteration for either linked lists or tape files. Nevertheless, a list is not a tape, and we don't have to treat it like one. We can take advantage of the fact that the linked list exists entirely within memory.

Comparison to Insertion Sort

A merge sort incurs a fair amount of overhead just keeping track of itself. A simpler algorithm, such as an insertion sort, may provide better performance with short lists. With long lists, however -- where performance is most likely to matter -- a merge sort will outperform an insertion sort by a wide margin.
[home]COBOL Home [crane]Data Structures [books]Further Reading