BabelBuster by Jonathan M. Baccash (jbaccash@princeton.edu) Submitted to Computer Science department, Princeton University in partial satisfaction of thesis requirement. Spring, 2001. Adviser: Andrew W. Appel. BabelBuster translates preprocessed C source code into English. It can also translate the English back into C. Enclosed are two folders, ckit and code. The ckit folder contains ckit version 1.0, slightly modified to fix a bug parsing large integer literals. ckit is a C front end written in SML that translates C source code (after preprocessing) into abstract syntax represented as a set of SML datatypes. It is also capable of pretty-printing the abstract syntax tree in C. I grabbed it from the web. See: http://cm.bell-labs.com/cm/cs/what/smlnj/doc/ckit/index.html. The code folder contains SML code and test-cases that I wrote. To use it, go to the code directory. At the shell prompt, type "build". (If you are using Windows, type "buildc2e", and when this build procedure is finished, type "builde2c".) When the program is finished compiling, you can use the programs c2e and e2c to convert from C to English and English to C, respectively. Usage: c2e source.c [out.e] e2c source.e [out.c] If you are familiar with ML, you can also start up an sml session to do the conversions. To do this, go to the code directory and start up a new sml session. At the prompt, type "CM.make();". When CM has finished making the program, the main structure introduced into the environment is BabelBuster. It has the following signature: structure BabelBuster : sig (* Print translation. Argument is an input file name. *) val cte : string -> unit val etc : string -> unit val ctc : string -> unit val ete : string -> unit (* Print translation. Arguments are input file name, output file name. *) val cte2 : (string * string) -> unit val etc2 : (string * string) -> unit val ctc2 : (string * string) -> unit val ete2 : (string * string) -> unit (* Print translation. Used for export. *) val c2e : (string * string list) -> OS.Process.status val e2c : (string * string list) -> OS.Process.status (* Pretty print C file *) val c2c : (string * string list) -> OS.Process.status end An additional Test structure is also introduced into the environment. It is used for testing purposes. Known limitations: - The English is usually pretty good, but could obviously be better. - Does not translate preprocessor macros. - Some trivial information is lost in translation. - Unary plus is removed. - Preincrements and predecrements whose results are unused are sometimes turned into postdecrements and postincrements. - Nested struct/union definitions are converted to top-level definitions. - Anonymous structs/unions/enums are given names. - Does not preprocess the C file. This should be done separately. - Code comments are removed. - Hex and octal literarals are replaced with decimal integers. - Other trivial lost information. - To use c2e or e2c, current directory must be babel-buster/code.