Simple Compiler Techniques: The Beginning

1. Overview

The sc.c is a simple compiler, which converts the infix notation into prefix or postfix notation. This program does not use lex (lexical analyzer), nor yacc (syntax analyzer), but instead it contains its own functions for this specific tasks. This program uses the syntax-directed translation to build the parse tree and afterwards it uses depth-first traverse to emit the "code".

2. Usage

./sc [prefix|infix|postfix|tree]

 $ gcc sc.c -o sc
 $ ./sc postfix
 1 * (2+3) + 4/abc;

As the result you will get the postfix notation:

 1 2 3 + * 4 abc / +

You can dump the parse tree by using the 'tree' switch, for example:

 $ ./sc tree
 1 * (2+3) + 4/abc;

will give you the following output:

 id      left    right   key
 0009    0005    0008    +
 0005    0001    0004    *
 0001    NIL     NIL     1
 0004    0002    0003    +
 0002    NIL     NIL     2
 0003    NIL     NIL     3
 0008    0006    0007    /
 0006    NIL     NIL     4
 0007    NIL     NIL     abc

which is the representation of this visible form:

[diagram]

3. Technique

Initial specification (left recursion):

 start → stmt eof
  stmt → expr ; stmt
       | ε
  expr → expr + term
       | expr - term
       | term
  term → term * fact
       | term / fact
       | fact
  fact → ( expr )
       | id
       | number

To eliminate left recursion, use this simple rule of transformation:

  A → A α | β

into:

  A → β R
  R → α R | ε

For instance:

  expr → expr + term
  expr → expr - term
  expr → term

transforms into right recursion:

  expr → term R
     R → + term R
     R → - term R
     R → ε

Now, after the transformation, this is the final grammar used by sc.c:

 start → stmt eof
  stmt → expr ; stmt
       | ε
  expr → term rE
    rE → + term rE
       | - term rE
       | ε
  term → fact rF
    rF → * fact rF
       | / fact rF
       | ε
  fact → ( expr )
       | id
       | number

Data structure used by the parse tree:

 struct node_struct
 {
   struct node_struct *left;
   struct node_struct *right;
   unsigned long node_id;

   enum node_type type;
   union {
     int number;
     SYMBOL *symbol;
     enum opcode_type opcode;
   } v;
 };

Other Articles