A typical compiler may be divided into three components: the front end, the intermediate representation, and the back end.
The front end reads the source code and passes it through a series of transformations: lexical analysis, syntactical analysis, and semantic analysis. Finally it constructs the intermediate representation.
The intermediate representation is a complex data structure which embodies the meaning and intent of the source code. At this point the details of language and syntax no longer matter. Stripped of all aspects of mere physicality, the intermediate representation is a pristine abstraction, shimmering unseen in iridescent groves of silicon.
The back end transforms the intermediate representation back into a physical form, namely object code or assembly language. It must be specific for the target machine, but it need not be specific for the source language. Different front ends may share the same back end.
These pages will ignore the intermediate representation and the back end, concentrating instead on the three phases of the front end.
The output of lexical analysis is a stream of tokens.
A syntactical analyzer may use an FSM, whose events would represent tokens rather than individual characters.
For programming constructs which may nest arbitrarily, such as many kinds of expressions, a technique called recursive descent is popular. The analyzer calls itself recursively for each level of nesting.
To represent expressions, a typical syntactical analyzer will build a type of tree called a parse tree. Each internal node of the tree represents an operator, each of whose child nodes is an operand. For nested expressions, an operand may be another operator with operands of its own.
JOCKEY employs a more simplistic approach called predictive parsing. Most of the time we knows what kind of token to expect. If we see the tag '/TN', for example, then the very next token had better be a telephone number. So we call a specialized FSM designed to extract telephone numbers.
Consider the example given above, where the syntax indicates a subscripted variable. The semantic analyzer would consult a list of declared variables, find the appropriate offset in memory, and verify that the variable has the right number of subscripts. It would also identify the variable named within the parentheses, and verify that it is suitable for use as a subscript. It might even issue an error or a warning if the literal subscript is not within bounds.
Finally the semantic analyzer integrates all the fragments of syntax to build the intermediate representation.
In particular, don't take the example too seriously. It's just a plausible guess about how a hypothetical COBOL compiler might work.
Further reading