Data Structures in COBOL
Life with Arrays
If you're a COBOL programmer, you already know about arrays (or
tables in official COBOL jargon). If you're like a lot of
COBOL programmers, arrays are the only kind of data structure you've
ever used, or even heard of. And you've grown accustomed to their
shortcomings:
- You have to hard-code the size of an array. If you're not sure
how big is big enough, you just have to make it humongous, and keep
your fingers crossed.
- It's awkward to rearrange an array. If you want to add something in
the middle, or delete something, you have to move a bunch of other stuff
around.
- Unless you indulge in a lot of confusing redefinitions, a given
array can only hold one kind of item. If that kind of item can have
multiple instances of something else, you may need a two-level array.
- An array is inherently linear, and the real world sometimes isn't.
For example, an array could readily contain the names of your ancestors
in the male line, for, say, 10 generations, in chronological order. But
if you want to include your mother's relations, and your aunts and
uncles and cousins, the array becomes a muddled mess.
These shortcomings are usually not a problem. When they become a
problem, we may be able to devise workarounds that are not too onerous.
The fact remains: arrays are not always the best way to structure our
data.
Linear Tyranny
Compilers can implement arrays easily because arrays are linear, like
memory. If the program is positioned at a particular spot and you need
to go to the next spot, all the compiler has to do is add a constant to
an address. The two spots are related by being physically adjacent in
memory.
While this principle is simple and efficient, it is also restrictive.
An array element can be adjacent to only two other elements: the one
in front and the one in back. If two adjacent elements cease to be
related, you have to move one or both of them somewhere else, and
perhaps move other things around as well.
When physical adjacency is not enough, you need some other way to
indicate that element A is related to element B. Element A must contain
some data item which refers explicitly to element B (or vice versa, or
both).
In other words, A can carry a pointer to B. If you have A,
then you have a pointer to B. If you have a pointer to B, then you
can find B.
Definition: a pointer (in the general sense) is any scrap of data that
enables you to access the thing it's pointing to.
Ways to Implement Pointers
A pointer (in the general sense) may take any of a variety of forms,
depending on the context:
- An address in memory;
- A subscript into an array;
- An offset from the beginning of a file;
- A record number;
- A key for a database or indexed file.
Usually the word pointer is used in a more specific sense: an
address in memory. In this sense, we can declare a
COBOL pointer by using the clause USAGE IS
POINTER.
However, it is possible (and sometimes useful) to build data structures
using other kinds of pointers.
Pointer Properties
Pointers have two important properties:
- By allowing a data element to carry more than one pointer, you can
represent relationships which are as complex as they need to be.
- Because physical adjacency no longer matters, you can rearrange
a set of connected elements just by juggling a few pointers.
Pointers are useful, but they are not free. To begin with, they occupy
storage in memory. More importantly, they are notoriously error-prone.
Misuse of pointers can create bizarre bugs which are difficult to track
down.
Data Structures
Definition: a data structure is a collection of related data elements,
together with a set of operations which reflect the relationships
among the elements.
This definition is artfully vague so as to include arrays. But you
already know about arrays. We will not discuss them further except as a
way to implement other data structures. If we are willing to exclude
arrays, we can devise another, more succint definition:
A data structure is a collection of data elements connected by pointers.
Some data structures, like arrays, are linear. Anything you might do
with a linear data structure you could do with an array, though perhaps
awkwardly:
Other data structures are inherently non-linear, because they contain
branches. You cannot perform the equivalent operations with an array
(except by implementing a data structure within the array,
using subscripts in the place of pointers):
Since linked lists are the most basic type of data structure, these
pages discuss them in some detail. Once you understand how linked lists
work, you can look upon other data structures as extensions of the same
ideas.
COBOL Home
Pointer Traps
Further Reading