Linked Lists

Linked lists are the simplest kind of data structure involving pointers. Besides being useful in their own right, they are a good forum for introducing the use of COBOL pointers.

Consequently these pages will discuss the examples in excruciating detail. Once you grasp the principles involved, you can apply them to other data structures, where our discussion will largely skip the low-level details.

Overview

In a simple linked list, each member of the list -- i.e. each node -- carries not only whatever data the application needs, but also a pointer to the next node. Somewhere we need a pointer to the first one, anchoring the entire list. It is often useful -- but never necessary -- to keep a pointer to the last node as well. The resulting arrangement (with three nodes in this case) looks like this:

[linked list] Each box represents a node, and each arrow represents a pointer. Some pointers are stored in the nodes, where the corresponding arrows originate. Others (FIRST-P and LAST-P) reside somewhere else, such as WORKING-STORAGE. Since the the last node in the list carries a NULL pointer, the corresponding arrow points to a special symbol for NULL.

We could have omitted LAST-P. If we didn't have it, we could always find the last node by traversing the list from the beginning. For certain operations (mainly appending to the end of the list), LAST-P merely provides a useful shortcut, especially for long lists. If the application doesn't need such an operation, there is no reason to maintain LAST-P.

On the other hand, FIRST-P is essential. Without it we could never find any of the nodes on the list (except the last one, if we had LAST-P). Any box with no arrows pointing to it is lost forever. This principle becomes important when we start rearranging the list: as we juggle various pointers we must take care not to lose a node.

If we do maintain LAST-P, it should be consistent with FIRST-P. That is, they should both be either NULL or not NULL. If only one of them is NULL (except temporarily while we are in the middle of a pointer-juggling operation), then our list is corrupt.

The above diagram also introduces a useful naming convention: the name of a pointer variable ends in '-P'.

The Example

In our example, we will manage a list of names and telephone numbers. We will keep a collection of pointers in WORKING-STORAGE:
 01  TN-POINTERS.
     05  FIRST-P          USAGE IS POINTER VALUE NULL.
     05  LAST-P           USAGE IS POINTER VALUE NULL.
     05  NEW-P            USAGE IS POINTER VALUE NULL.
     05  DEAD-P           USAGE IS POINTER VALUE NULL.
     05  CURR-P           USAGE IS POINTER VALUE NULL.
     05  PREV-P           USAGE IS POINTER VALUE NULL.  
We have already met FIRST-P and LAST-P. The others we will use as temporary variables for various purposes.

The LINKAGE SECTION will contain the following entries:

 01  TN-NODE.
     05  TN-NEXT-P        USAGE IS POINTER.
     05  TN-WTN           PIC X(10).
     05  TN-NAME          PIC X(15).

 01  ALT-TN-NODE.
     05  ALT-TN-NEXT-P    USAGE IS POINTER.
     05  ALT-TN-WTN       PIC X(10).
     05  ALT-TN-NAME      PIC X(15).
We will use ALT-TN-NODE whenever we need two different nodes at the same time -- for instance, when we add a node and must connect it to the others.

In the PROCEDURE DIVISION, we assume the existence of two utility paragraphs:

For now we will not worry about how the allocation and deallocation operations work.

Operations

Given this infrastructure, there are a number of operations we might want to perform: Each of these operations (except for sorting) turns out to be fairly trivial. By combining them as needed, you can do pretty much anything you'll ever need to do with a linear structure.

Nevertheless, for performance reasons, other data structures may be more appropriate in a given case. If your application does a lot of insertions and deletions in the middle of a list, a doubly-linked list may work better. For large numbers of nodes a skip list or an ordered tree will provide faster searches, comparable to a SEARCH ALL in conventional Cobol.

Special Cases

Make sure that each operation works correctly in three situations: In order to avoid coding for these special cases, some people prefer to maintain a dummy node at each end of the list. These dummy nodes serve as placeholders but contain no meaningful data. Any node with real data is therefore somewhere in the middle, and never requires special treatment as an end node.

This approach can simplify addition and deletion, but may make other operations (such as concatenation) more complicated. The decision to use dummy nodes depends partly on personal taste and partly on the operations you need to implement.

Caveats

The sample code was invented for the examples and has not been compiled or tested. It may well contain typos or other bugs.

Even if they are bug-free, the samples are not the only way to code the operations illustrated. Many minor variations are possible. Pay less attention to the details of coding than to the principles involved. The diagrams are perhaps more important than the sample code.


[crane]Data Structures [chip]Allocating memory [hand]COBOL Pointers [books]Further Reading
[home]COBOL Home