Rewriting Spaghetti Code

You've inherited a horrible ugly COBOL program with dozens or hundreds of GO TOs and PERFORM THRUs. It has many obscure quirks peculiar to the application. People have patched it for years, and by now nobody knows how it works, or even if it works.

You have several options:

Rewriting Incrementally

You don't have to understand the entire program -- just a little bit at a time. Wherever you see a safe chance to untangle a little piece of code, do it. The more you untangle, the easier it will be to untangle the rest.

Through much of this process you don't have to understand the program at all. Mechanical rearrangements of the code can systematically transform tangled logic into equivalent structured logic.

Of course, it is possible for COBOL to be structured in a formal sense -- but poorly structured. As you wrestle with the code you can become familiar with small pieces of it, recognize what the code is trying to do, and further massage it to achieve a good style. Step by step, your goal is to make the code look normal.

(So what does "normal" mean? Well, er, um, it means "the way it ought to be." In practice it means, "the way I would have done it.")

Rewriting a little at a time is easy to fit into your schedule. At each stage of the rewrite you still have a working program which is the logical equivalent of the original. You can put it into production and return to it later, when you have time or opportunity. Even a partially rewritten program will be less bad than the original mess.

Discovering Bugs

As you continue your campaign to make the code look normal, you will likely reach an impasse. At some point, you may be unable to make the code look normal without changing its behavior.

At that point you have probably discovered a bug. Study the code carefully, verify that it doesn't make sense, determine what it intended to do, and then fix it. Adjust your test plan so that your regression testing uses the new code as a baseline instead of the old.

In one of my early rewrite efforts, before I was as careful as I am today, I rearranged some code to reflect its evident intent. To my dismay, regression testing showed that the new program behaved differently from the original, even though my code looked perfectly sound. Eventually I discovered that I had fixed an unrecognized bug by accident.

Discipline

The obvious danger to the incremental approach is that your changes will have unintended consequences. While there is no substitute for being careful to the point of paranoia, certain measures will reduce the risks:
  1. Establish a test plan from the beginning, so that you can compare the output of the old and new versions.
  2. Test early and often.
  3. Make your changes in the smallest possible increments. Stare at the code and determine whether each change, considered separately, will preserve the behavior of the original in all circumstances.
  4. Transform the code in a systematic sequence of stages. Each stage makes subsequent stages easier and safer, even if it makes the code uglier in the short run.
The first three points are fairly obvious, and require no further comment. The last one is tougher. In fact it is the heart of the problem: How do we transform the code without breaking it?

These pages do not attempt to define a rigorous algorithm, such as would be needed by the people who write code-structuring tools. They do outline a series of stages to be applied manually.

Caveat

The techniques described on these pages are based mostly on my own experience, but some of them are theoretical. For example, I have never had the misfortune to encounter an ALTER statement. Having read the manual and thought about it for a while, I think I know what to do with one, but I've never tried it.

If my recommendations are flawed, please let me know. Send me your tips, your tricks, and your war stories (and see the Guidelines for Contributors).

Another Viewpoint

Tony Cahill has written an article about restructuring COBOL code. His approach emphasizes the transformation of paragraphs into subprograms or nested programs, which communicate through explicit parameters and, where necessary, global variables. I wouldn't usually go that far, but the article is worth reading. It also contains references to number of technical articles on similar topics.
[home]COBOL Home [style forum]COBOL Style Forum [stage]Stages