Data Structures in COBOL

Life with Arrays

If you're a COBOL programmer, you already know about arrays (or tables in official COBOL jargon). If you're like a lot of COBOL programmers, arrays are the only kind of data structure you've ever used, or even heard of. And you've grown accustomed to their shortcomings: These shortcomings are usually not a problem. When they become a problem, we may be able to devise workarounds that are not too onerous. The fact remains: arrays are not always the best way to structure our data.

Linear Tyranny

Compilers can implement arrays easily because arrays are linear, like memory. If the program is positioned at a particular spot and you need to go to the next spot, all the compiler has to do is add a constant to an address. The two spots are related by being physically adjacent in memory.

While this principle is simple and efficient, it is also restrictive. An array element can be adjacent to only two other elements: the one in front and the one in back. If two adjacent elements cease to be related, you have to move one or both of them somewhere else, and perhaps move other things around as well.

When physical adjacency is not enough, you need some other way to indicate that element A is related to element B. Element A must contain some data item which refers explicitly to element B (or vice versa, or both).

In other words, A can carry a pointer to B. If you have A, then you have a pointer to B. If you have a pointer to B, then you can find B.

Definition: a pointer (in the general sense) is any scrap of data that enables you to access the thing it's pointing to.

Ways to Implement Pointers

A pointer (in the general sense) may take any of a variety of forms, depending on the context: Usually the word pointer is used in a more specific sense: an address in memory. In this sense, we can declare a COBOL pointer by using the clause USAGE IS POINTER.

However, it is possible (and sometimes useful) to build data structures using other kinds of pointers.

Pointer Properties

Pointers have two important properties: Pointers are useful, but they are not free. To begin with, they occupy storage in memory. More importantly, they are notoriously error-prone. Misuse of pointers can create bizarre bugs which are difficult to track down.

Data Structures

Definition: a data structure is a collection of related data elements, together with a set of operations which reflect the relationships among the elements.

This definition is artfully vague so as to include arrays. But you already know about arrays. We will not discuss them further except as a way to implement other data structures. If we are willing to exclude arrays, we can devise another, more succint definition:

A data structure is a collection of data elements connected by pointers.

Some data structures, like arrays, are linear. Anything you might do with a linear data structure you could do with an array, though perhaps awkwardly:

Other data structures are inherently non-linear, because they contain branches. You cannot perform the equivalent operations with an array (except by implementing a data structure within the array, using subscripts in the place of pointers):

Since linked lists are the most basic type of data structure, these pages discuss them in some detail. Once you understand how linked lists work, you can look upon other data structures as extensions of the same ideas.
[home]COBOL Home [trap]Pointer Traps [books]Further Reading