Table-Driven Finite State Machines

One way to implement a finite state machine (or "FSM") is to encode the rules in an array -- or in COBOL terms, a WORKING-STORAGE table. In response to any given event, each state consults a look-up table to identify the corresponding action.

With this approach the code in the PROCEDURE DIVISION can be simple and compact, because most of the intelligence lies in WORKING-STORAGE. For the same reason, however, the WORKING-STORAGE table requires a lot of tedious typing, even if it does not occupy much memory.

Encoding the Events

In the most obvious and simple-minded implementation, the table would have an entry for every possible combination of state and input character. Since a character may have 256 possible values (more in some architectures), such a table would require a lot of typing even for a very small FSM. Even if we restrict ourselves to a smaller character set -- say, alphanumerics and a few specific punctuation marks -- the tables would be too big to be practical unless we used an automated tool to generate the code.

The simplest way to compress the table is to divide the character set into a small number of categories. For example, during lexical analysis we can probably treat all letters as equivalent, and all digits as equivalent. With additional categories for blank spaces, end-of-line, a few specific punctuation characters, and a garbage category for other characters, we may need no more than a dozen entries for each state.

With a little ingenuity we may be able to compress the table further by exploiting whatever redundancies we can find. For smallish FSMs, however, it's probably not worth it.

Encoding the Actions

Every action must include a transition to the next state (even if it's the same state). Ordinarily the table will directly encode the identity of the next state. The FSM must also have a way to recognize terminal states, either by a table entry or by special treatment in the PROCEDURE DIVISION.

In some FSMs each state reads the next input character. If so, this input may be hard-coded into the PROCEDURE DIVISION. If not, the table may specify which states read a character and which do not. Likewise many states, but not necessarily all, will add the current character to some kind of buffer.

Other actions depend on the peculiarities of the FSM. A terminal state may do something with the token it has recognized, or issue an error message. A non-terminal state might apply some kind of test -- e.g. check the length of the token accepted so far.

One approach is for the table to carry symbols referring to paragraphs. The FSM will look up the symbol by state and event, and then perform the corresponding paragraph.

(If we were programming in C or C++ we might do the same thing more directly by storing function pointers in the table. In C++ we could also store pointers or references to functors. However, these pages are directed to COBOL programmers.)

Sometimes the various actions comprise different combinations of a few sub-actions. If so, the table can encode a flag for each sub-action. The FSM performs whichever actions are turned on for a given state and event.

Indexing Schemes

If you assign a non-zero number to each state and to each category of event, then the FSM can use those numbers as subscripts to access a two-dimensional table directly. Since zero is not a valid subscript (again we are talking COBOL here, not C), you can give it a special meaning. In particular: if the table says that the next state is number zero, that means that there is no next state; you have reached a terminal state.

There may be cases where it is not feasible to use subscripts in this manner. If you have to use the SEARCH verb, then use the SEARCH verb, but performance may suffer.


See Also:

[gears]Finite State Machines blockParsing [books]Further reading
[home]COBOL Home