Trees

A tree is a branching structure arranged in a hierarchy. A node may be connected to three or more other nodes, of which one is the parent (above it in the hierarchy) and the others are children (below it in the hierarchy).

One of the nodes, called the root, has no parent. If a node has any children it is called an internal node. Otherwise it is called a leaf node. Children of the same parent might be called siblings.

We are mixing metaphors here -- parents and children, roots and leaves. Think of it as a family tree.

The botanical metaphor becomes awkward when we start drawing diagrams. By convention, we normally draw the tree upside down, with the root at the top:

[simple binary tree] This example shows a simple binary tree, meaning that each node has no more than two children. C is the root node. It is the parent of its children E and F. A and G are siblings of each other, as are several other pairs. A, R, B, W, and Y are leaf nodes, and the others are internal nodes.

In other kinds of trees a parent may have more than two children, or there may be restrictions on the branching pattern. There are many types of trees for specialized purposes. These pages will not attempt to cover the subject in any detail.

Implementing a Tree

As with other kinds of data structures, we can use pointers to represent the connections among adjacent nodes. A node in a binary tree might look like this:
 01  BINARY-TREE-NODE.
     05  NODE-PARENT-P            USAGE IS POINTER.  
     05  NODE-LEFT-CHILD-P        USAGE IS POINTER.
     05  NODE-RIGHT-CHILD-P       USAGE IS POINTER.
     05  NODE-DATA                PIC X(30).
In this example, each node carries a pointer to each of its immediate neighbors. Thus each connection is represented twice, by a pointer at each end. For the root, the parent node is NULL. For a leaf node, the child pointers are both NULL.

In a given case we may be able to eliminate the redundancy. If we traverse the tree only from the root down, we only need the child pointers. If we traverse it only from the bottom up, we only need the parent pointer.

Trees as Ordered Lists

Sometimes a tree is just a way of representing an ordered sequence. We could rearrange the nodes of the earlier example into an ordered tree:

[ordered binary tree] Alphabetically, the letter in each node is between the letters in the children. The left child is first, then the parent, then the right child. Thus G is between C and Y; Y is between R and Z. R has no left child, but it comes before its right child W.

Each node is in the proper position not only with respect to its immediate children, but with respect to all their descendants. Thus G comes after C and all of C's descendants: A, B, E, and F. Likewise G comes before all of Y's descendants.

When used this way, a tree is a pseudo-linear structure. An ordered linked list could support all the same operations. The differences lie in (1) complexity of the code and (2) performance.

Searching an Ordered Tree

Suppose we are looking for the W node. We start at the root: We had to inspect only four nodes to find the one we were looking for. If we had searched a linked list, we would have had to inspect seven nodes.

Okay, so we stacked the deck by looking for a letter near the end of the alphabet. If we had looked for A, the linked list would have been faster.

Still, on average the tree provides faster searches. For a linked list, the average number of steps is half the number of nodes. For an ordered binary tree, the average number of steps is roughly the base 2 logarithm of the number of nodes (provided that the tree is balanced -- see below). For a large number of nodes, the difference is substantial. A binary tree beats a linked list in the same way that a SEARCH ALL beats a SEARCH.

Building an Ordered Tree

When we add a node, we first do a search to find out where it belongs. We may be able to add it as a leaf node, without rearranging anything else. In general, however, we may need to add an internal node, rearranging nearby nodes in the process.

Suppose we want to add an H node to our example. Here are two ways to do it (and there are others):

[H inserted] [H inserted again]
The first one involves only a little rearrangement, but leaves us with a deeper tree. Now it will take an extra step to find W. The second one leaves the depth unchanged, but involves more rearrangement. G used to be the root, and now it is a leaf.

There are various algorithms for rearranging trees. All of them are more complicated than an insertion into a linked list.

Balancing Trees

Depending on how we build an ordered tree, we might come up with something like this:

[spindly tree] This tree will perform little better than a linked list. There are ways to balance a tree so that it isn't so spindly or lopsided, but don't even try to figure out for yourself how to do it. Find a good textbook on data structures and algorithms.

You might build the tree once, balance it once, and then leave it alone. If the tree is subject to continual additions and deletions, you might prefer to balance it on the fly as needed.

Trees as Hierarchies

In an ordered tree, the connection between a parent node and a child node is just a gimmick for representing a sequence. The parent and child are not related in any other sense. We are free to sever the connection whenever we balance the tree or otherwise rearrange it.

In other kinds of trees such a connection may be more significant. For example, a tree might represent the reporting structure of a company:

[org chart] Larry is the CEO, and the root node. His subordinates are Jim, Chuck, and Sally. Bob and Sue work for Jim, and Bill and Marcia work for Sally; they are leaf nodes.

Here the tree represents a hierarchy in the real world. The connection between a parent node and a child node reflects the relationship between a boss and an underling. Since the tree is not designed for efficient searches, we don't need to balance it, nor can we.


[home]COBOL Home [crane]Data Structures [books]Further Reading