Automated Finite State Machines

It isn't hard to hand-code a small finite state machine (or "FSM"), but for larger FSMs, the manual approach is almost unmanageable. There are too many states and events to keep track of. Since a general-purpose programming language like COBOL or C is on a lower level of abstraction than the FSM itself, it is difficult to grasp how an FSM works (or fails to) by looking at the code.

Specialized tools can make larger FSMs more manageable.

Regular Expressions

Many tools in the UNIX world are based on regular expressions -- a set of conventions for defining patterns of characters. We are all familiar with the use of wild-cards for specifying file names in MS-DOS. Regular expressions provide a much more powerful way of matching text strings to a pattern.

For example, the regular expression [a-z]*\.[0-9]$ matches any sequence of zero or more lower case letters, followed immediately by a period, a digit, and the end of the line.

From C or C++ you can call the regcomp() function to compile a regular expression. Internally, this function (in a typical implementation, at least) builds a representation of an FSM. Then you can call the regexec() function to search for a matching string within a given fragment of text. These functions are not defined as part of Standard C, but they are widely available.

Using these and related functions as building blocks, the UNIX world has developed powerful tools such as grep, awk, and lex. If similar functions are not already available for MVS, it would probably be possible to find public-domain versions, port them, and call them somehow from COBOL.

However, there is a major complication: the difference in character sets between IBM and the rest of the world. For example, the regular expression [A-Z] normally matches any upper-case letter. In EBCDIC, however, the alphabetic characters are not contiguous. Code designed for ASCII will result in some surprises when applied to EBCDIC.

lex and yacc

lex is a code generator for automating lexical analysis. Its input is a specialized language, based in part on the use of regular expressions, for specifying lexical rules. Based on these instructions it generates C code for a lexical analyzer.

yacc is another code generator -- the name is an acronym for Yet Another Compiler Compiler. It generates C code for further analysis of a stream of tokens. When used together, lex and yacc can quickly produce most of the front end for a non-trivial compiler.

The Free Software Foundation provides near-equivalents of lex and yacc, called flex and bison, respectively.

Commercial Packages

One tool, called Libero, generates FSMs in COBOL (or any of several other languages). The developer specifies the states, actions, and transitions using a specialized language.

Libero is available for Unix, VMS, OS/2, MS-DOS, and various versions of Windows. An MVS version is not yet available. Contact iMatix for details.


[gears]Finite State Machines blockParsing [books]Further reading
[home]COBOL Home [iMatix]iMatix (home of Libero)