Sorting a Linked List
One of the favorite pastimes of computer scientists is to devise
sorting algorithms. Any good textbook will describe several popular
algorithms: bubble sort, Shell sort, heapsort, quicksort, and more.
However, most texts describe these algorithms in terms of arrays. It
is not always obvious how to apply them to a linked list. In addition,
some algorithms (such as quicksort) are recursive, and hence awkward to
implement in COBOL.
These pages describe two approaches which are well suited to linked
lists in COBOL. The insertion sort is easier to implement; the merge
sort is more complicated but also more efficient for long lists.
Insertion Sort
The simplest way to sort a linked list is to build it sorted in the
first place. Add the nodes one at a time. As you add each one,
scan the list to find the right position. Once you find the node which
should follow the new node, you have to back up to find the
node which should precede the new node.
With doubly linked lists it's easy to back up.
With singly-linked lists it's a bit more awkward. You can start over
from the beginning, searching for the node which points to the node you
already found. However, it's easy to avoid this second pass. On the
first pass, maintain two pointers: one pointing to the node you're
comparing to the new node, and another pointing to the node just behind
that one.
If the list already exists in an unsorted state, then temporarily
transfer it all to a second list -- you can perform this transfer just
by exchanging a few pointers. Then build it back, one node at a time.
For an insertion sort, the number of key comparisons is roughly
proportional to the square of the number of nodes. For a modest number
of nodes, that's probably as efficient as it needs to be. For a large
number of nodes, consider using a more efficient algorithm such as a
merge sort.
Merge Sort
A merge sort uses the same kind of match-merge logic we have come to
know and love in ordinary batch processing. You've probably coded it
at least a dozen times. You read two files sorted by the same key, at
each step reading whichever file is behind the other one, or both if
they match. When you reach end-of-file on one file, you keep reading
the other one.
In ordinary batch processing, of course, both files have to start out
already sorted. In a merge sort we start from scratch. We slosh the
data back and forth among several lists. At each pass, we merge small
chunks of sorted data into larger chunks. When there's only one chunk
left, we're done. The number of comparisons is proportional to
n * log n, where n is the number of nodes.
This approach is described in greater detail
elsewhere.
Other Approaches
-
Allocate an array of pointers, initializing each pointer to one of the
nodes in your list. Use your favorite algorithm to sort this array,
comparing the pointers according the contents of the corresponding
nodes. Then rebuild the list according to the sorted pointers.
- Instead of inserting nodes into an ordered list, insert them into
an ordered tree. For long lists a tree offers
performance comparable to that of a merge sort. For best results,
however, you may need to write some complicated logic to keep the tree
reasonably balanced. This extra work is probably not worth it unless
you can also use the tree for efficient searches.
-
Write the contents of your nodes to a sortfile, and use the COBOL SORT
verb. Rebuild the list from the sorted file. This technique may be the
easiest of all to code, and performance may be acceptable if you only
have one list to sort. For sorting many lists repeatedly, performance
is likely to be abysmal.
Pointer Traps
Data Structures
Further Reading
COBOL Home