Syntax-Directed Translation for Javac: Modified Javac to compile a new language (C-Minus-Minus).
- In this project I modified Javac (
openjdk-6-src-b22-28_feb_2011
) for building a compiler-compiler using Syntax-Directed Translation (SDT). - I added a new construct to the Javac parser called
grammar
, representing a full-fledged Syntax-Directed Definition which maps a CFG to its production rules- You can find the new
grammar
construct here in the javac source code
- You can find the new
- I then used that
grammar
to build a compiler for a simple language called C-Minus-Minus (short: CMM) [CMM specs pdf] [CMM samples] [my CMM compiler source code] and convert it to a proprietary assembly language calledBASS
(BASS documentation + examples). - I even patched a bug in Javac (enum parsing was broken) that was in that build (
openjdk-6-src-b22-28_feb_2011
) [official download link] - Within my CMM compiler, the language is defined in the CMMGrammar.
- NOTE: You see this right - This
.java
file does not contain aclass
,interface
etc... but instead it defines apublic grammar CMMGrammar
- That is why I had to make modifications to Javac! 😊
- NOTE: You see this right - This
The CMMCompiler uses the CMMGrammar to compile everything in a few simple steps.
- The CMMParser builds the AST, based on the Grammar rules
- The parser internally calls the Scanner which generates a token stream
- When the AST is ready, we start running through it and execute the grammar-defined rules.
- The
grammar
can define an arbitrary amount ofgrammarpasses
, and the CMMGrammar defines two, each with their own production rules (code to be executed when matching the lefthand-side patterns in the AST):prep0
to prepare everything andgen
to generate code (using the Gen class)
- The
- The CMMGrammar is only 724 lines long (including comments!). Our modification to the Javac parser then takes that and converts it into a regular Java class before letting the Javac code generator worry about the rest.
- NOTE: You find that the equivalent
CMMGrammar
class emitted by our modified Javac has over 5000 lines, that's almost 7 times as many lines, and does contain any comments!
- NOTE: You find that the equivalent
- I then used this compiler to successfully compile these 7 sample CMM programs to BASS assembly code
You can find the initial proposal and 2011 final report in this folder. NOTE: The report was not very well written (I loved coding those thousands of lines, but report writing I did not enjoy back then...)