I. Code Generation
 A. overview
------------------------------------------
         OVERVIEW OF CODE GENERATION

 .. ASTs...-> [ Static Analysis ]
                     |
                     | IR
                     v
              [ Code Generation ]
                     |
                     | Machine Code
                     |
                     v
               Virtual Machine
                   Execution

The IR (= Intermediate Representation)
    records


------------------------------------------
  1. IR (Intermediate Representation)
         What kind of information is needed from a name's use
            in order to generate code?
         Should the parser create the lexical address of a name's use
            during paring?
         Is the symbol table unchanging (immutable)?
------------------------------------------
            IR TREES

An IR is a tree structure,


Helps in modularizing compilers
and code generation

   WITHOUT IR               WITH IR

Java ------>x86       Java        ->x86
     \/ |||                \     /
   C ------>MIPS         C \\   /-->MIPS
     \\/ ||                 ->IR
 C++ ------>Sparc      C++ //   \-->Sparc
      \\\/                 /     \
  C# ------>A1          C#/       \>A1

------------------------------------------
------------------------------------------
          OUR CHOICES FOR AN IR

To keep things simple, we will use
   a modified AST type as an IR

Parser:
   - records
   - provides


Static analysis:
   - records


------------------------------------------
  2. General strategy
------------------------------------------
   GENERAL STRATEGY FOR CODE GENERATION

Don't try to optimize!


Follow the grammar of


------------------------------------------
------------------------------------------
       FOLLOWING THE GRAMMAR

Code resembles the grammar that


When


------------------------------------------
     How does this relate to the parser?
     Why is this useful?
 B. Translation target: code sequences
------------------------------------------
         TARGET: CODE SEQUENCES

Need lists of machine code

Why?


------------------------------------------
        Why are code sequences needed?
------------------------------------------
     REPRESENTING CODE SEQUENCES IN C

#include "instruction.h"

// code that can be in a sequence
typedef struct code_s code;
// code sequences
typedef code *code_seq;

// machine code instructions
typedef struct code_s {
    code_seq next;
    bin_instr_t instr;
} code;

------------------------------------------
 C. Designing Code Sequences
  1. Overall strategies
------------------------------------------
 STRATEGIES FOR DESIGNING CODE SEQUENCES

Work backwards


------------------------------------------
------------------------------------------
       EXPRESSION EVALUATION

Example: (E1 + E2) - (E3 / E4).


Constraints:
 - Expressions have a result value
 - Binary operations (+, -, *, /)
   in the SRM
   need 2 registers

Where should the result be stored?

  Can it be a register?


------------------------------------------
  2. use of registers
------------------------------------------
       USE OF REGISTERS

What if the register is already in use?
   e.g., $v0 for expression's value
     consider   x := y + z

Strategies:
 - use a different register


 - save and restore


------------------------------------------
------------------------------------------
     GENERAL STRATEGY FOR EXPRESSIONS

Each expression's value goes


To operate on an expression's value
   in a register r:


------------------------------------------
  3. Background on SRM instructions
------------------------------------------
      BACKGROUND: SRM INSTRUCTIONS

 ADD s,t,d   "GPR[d] = GPR[s]+GPR[t]"
 SUB s,t,d   "GPR[d] = GPR[s]-GPR[t]"
 MUL s,t     "HI,LO = GPR[s]*GPR[t]"
 DIV s,t     "HI = GPR[s] % GPR[t]"
              and "LO = GPR[s] / GPR[t]"

 LW b,t,o    "GPR[t] = memory[GPR[b]+4*o]"
 SW b,t,o    "memory[GPR[b]+4*o] = GPR[t]"
ADDI s,t,i   "GPR[t] = GPR[s]+sgnExt(i)


How to move value from r1 to r2?


What limitations on immediate operands?


What if the literal doesn't fit?


------------------------------------------
        If the numbers are small enough, where is the result of a
        multiplication as a 32 bit integer located: HI or LO?
        How would you move the value in register r1 to register r2?
        Are there limitations on the immediate operands for ADDI?
        What can you do if you want a constant value that doesn't fit?
 D. Literal Table
------------------------------------------
            LITERAL TABLE IDEA

- Store literal values in


- Keep mapping from


- Initialize


------------------------------------------
------------------------------------------
  LITERAL TABLE IN EXPRESSION EVALUATION

Idea for code for numeric expression, N:

    1. Look up N in global table,
    2. Receive N's 


    3. generate a load instruction
         into 


    4. 


------------------------------------------
     What's our goal for expression code?
------------------------------------------
    LITERAL TABLE AND BOF DATA SECTION

How to get the literals into memory
   with the assumed offsets?


------------------------------------------
 E. Activation Record (AR) Layout
    Where should constants and variables for a block be stored?
------------------------------------------
   LAYOUT OF AN ACTIVATION RECORD

Must save SP, FP, static link, RA
   and registers $s0-$s7

Can't have offset of static link
    at a varying offset from FP

Layout 1:

  FP --> [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ local constants        ]
         [      ...               ]
         [ local variables        ]
         [      ...               ]
         [ temporary storage      ]
   SP -->[       ...              ]


Layout 2:

         [      ...               ]
         [ local variables        ]
         [      ...               ]
   FP -->[ local constants        ]
         [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ temporary storage      ]
   SP -->[       ...              ]


Advantages of layout 1:


Advantages of layout 2:


------------------------------------------
       What are the advantages of layout 1?
      What are the advantages of layout 2?
      Any disadvantages?
      Which layout should we use?
 F. Compiling Expressions
  1. deciding where to start
    What are the simplest cases for expressions?
------------------------------------------
        TRANSLATING EXPRESSIONS

Abstract syntax of expressions in PL/0

  E ::= E1 o E2 | x | n
  o ::= + | - | * | /


Simplest cases are:


------------------------------------------
  2. example translations
   a. numeric literals
------------------------------------------
 TRANSLATION SCHEME FOR NUMERIC LITERALS


------------------------------------------
------------------------------------------
  TRANSLATION SCHEME FOR VARIABLE NAMES
           (AND CONSTANTS)


------------------------------------------
   b. binary operations
------------------------------------------
TRANSLATION SCHEME FOR BINARY OPER EXPRS

Goal is to get to an instruction like


Example: E1 - E2


------------------------------------------
    So, for E1 - E2 what needs to be done?
 G. Declarations
   Where are constants and variables stored?
------------------------------------------
 TRANSLATION SCHEME FOR PL/0 DECLARATIONS

   const c = n;
   var x;

When do blocks start executing?


What should be done then?


How do we know how much space to allocate?


How to initialize constants?


How to initialize variables?


------------------------------------------
      When are blocks executed in PL/0?
      When starting to execute a block, what should be done?
      Which should be allocated first: constants or variables?
      How do we know how much space to allocate?
      How to initialize constants?
      How to compute the value of the constant?
      How to initialize variables?
 H. Statements
  1. Basic Statements
   What are the base cases in the grammar for statements?
------------------------------------------
 TRANSLATION SCHEME FOR BASIC STATEMENTS


    skip

    x := E


    read x


    write E


------------------------------------------
            For testing, want to know: What are the simplest cases?
            In general, can the "levels outwards" part of
                the lexical address be determined when the variable
                is declared?
            Does the same thing work for constants?
                Should we write a character with code E or the digits of E?
 I. Conditions
  1. Overall conditions
------------------------------------------
        GRAMMAR FOR CONDITIONS

<condition> ::= odd <expr>
              | <expr> <rel-op> <expr>
<rel-op> ::= = | <> | < | <= | > | >=

So the code recursion structure is?


Code looks like:


------------------------------------------
         What should these functions return?
   a. Relational operator conditions
------------------------------------------
     RELATIONAL OPERATOR CONDITIONS

<condition> ::= <expr> <rel-op> <expr>

A design  for conditions:

 Goal: put true of false on top of stack
       for the value of the condition

 Consider E1 <> E2

  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  # jump past 2 instrs,
  # if GPR[$v0]!=GPR[$at]
  BNE $v0, $at, 2
  # put 0 (false) in $v0
  ADD $0, $0, $v0
  # jump over next instr
  BEQ $0, $0, 1
  # pub 1 (true) in $v0
  ADDI $0, $v0, 1
  # now $v0 has the truth value
  [code to push $v0 on top of stack]

 Consider E1 >= E2
  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  SUB $v0, $at, $v0      # $v0 = E1 - E2
  # jump past 2 instrs,
  # if GPR[$v0]>=GPR[$at] # if E1-E2 >= 0
  BGEZ reg, $at, 2        # skip  2 instrs
  # put 0 (false) in reg
  ADD $0, $0, reg
  # jump over jext instr
  BEQ $0, $0, 1
  # pub 1 (true) in reg
  ADDI $0, reg, 1

------------------------------------------
          What would work for =?
          What would you do for < ?
------------------------------------------
     CODE FOR BINARY RELOP CONDITIONS

// file ast.h
typedef struct {
    file_location *file_loc;
    AST_type type_tag;
    expr_t expr1;
    token_t rel_op;
    expr_t expr2;
} rel_op_condition_t;


// file gen_code.c

// Requires: reg != $at
// Generate code for evaluating condAST into reg
// Modifies when executed: reg, $at
code_seq gen_code_relop_cond(
              rel_op_condition_t condAST,
              reg_num_type reg)
{


}
------------------------------------------
 J. Control Flow Statements (Compound Statements)
     Why is it useful to write the base cases first?
------------------------------------------
 ABSTRACT SYNTAX FOR COMPOUND STATEMENTS

S ::= begin { S }
    | if C S1 S2
    | while C S

So what is the code structure?


Code looks like:


  begin S1 S2 ... end


  if C S1 S2


  while C S


------------------------------------------
II. Code Generation for Procedures
------------------------------------------
      SUPPORTING PROCEDURES AND CALLS

Main issues:
   - storing their code
     Why?


   - knowing exactly where each starts
     Why?


Another issue:
   - sending the right static link

------------------------------------------
        What static link does a called procedure need?
 A. Where to store code for procedures?
   Where do we put the code sequences for each procedure?
------------------------------------------
       WHERE TO PUT PROCEDURE CODE?

Possible layouts in VM's code array:


------------------------------------------
       How would you implement each?
       Which layout makes the most sense?
 B. how to find each procedure's starting address?
------------------------------------------
      NESTED PROCEDURES ARE A PROBLEM

  procedure A;
    procedure B;
      begin # B's body code...
            call A # ...
            # ...
      end
  begin
     # A's body code
     call B # ...
     # ...
  end

If lay out the code as

   [ code for A ]
   [ code for B ]

How do we know the address of B
    to compile the call to B?


What about the other direction?


------------------------------------------
------------------------------------------
   RECURSIVE PROCEDURES, SIMILAR PROBLEM

  procedure R;
    begin
      # R's body code ...
      call R
      # ...
    end

Before storing code for R,
  how do we know where it starts?


------------------------------------------
------------------------------------------
        MUTUAL RECURSION
        
  procedure O;
    begin # O's body code...
      call E
      # ...
    end

  procedure E;
    begin
      # E's body code ...
      call O
      # ...

One of these must before the other in
  the code area of the VM...


------------------------------------------
      No matter which of O or E is put first,
         how is the call to the second one to know
         where the second one starts?
  1. solutions
------------------------------------------
       SOLUTION STRATEGIES FOR CALLS

[Multiple passes]:
  1. Generate code for each procedure
     (+ store offsets in symbol table,
      + layout procedure code in memory)
  2. Gather table of addresses
     (map from names to addresses,
      using offsets and beginning address)
  3. Patch up code addresses for calls
     (+ output code)

[Lazy evaluation, labels]:
  1. Generate code for each procedure
     with calls to labels
     (+ store or update
        labels in symbol table)
  (+ output code)
------------------------------------------
 C. Multiple Passes as a Solution
------------------------------------------
      GENERAL SOLUTION: MULTIPLE PASSES

Problem: where does each procedure start?

Solution idea:
  1. Compile all procedure code
     (now know how big each procedure is)
  2. Lay out procedure code in memory
     (now know where each starts)
  3. Change each call instruction


------------------------------------------
        What would a progrm need to do to change all the call instructions?
 D. Labels as a Solution
------------------------------------------
         GENERAL SOLUTION: LABELS

Use "labels" to allow


Term "label" is from assembly language

    ;  ...
    jmp L
    ; ...
    L: ; ...

------------------------------------------
------------------------------------------
        APPROACHES TO FIXING LABELS

Problem: convert labels to addresses

 (1) Use multiple passes
       a. Generate code with labels
       b. Lay out memory for procedures
          (determine starting addresses)
       c. Change labels to addresses

     advantages:


     disadvantages:


 (2) Use shared mutable data (lazy eval.)
       a. labels are unique placeholders,
          shared by all uses (calls)
       b. when address is determined,
          update the placeholder
          (and all uses are updated)

     advantages:


     disadvantages:


------------------------------------------
  1. label data structure for lazy evaluation
------------------------------------------
    LABEL DATA STRUCTURE FOR LAZY EVAL

// file label.h

#include "machine_types.h"

typedef struct {
    bool is_set;
    address_type byte_addr;
} label;

// Return a fresh label that is not set
extern label *label_create();

// Set the address in the label
extern void label_set(label *lab,
                      address addr);

// Is lab set?
extern bool label_is_set(label *lab);

// Requires: label_is_set(lab)
// Return the address in lab.
extern
address_type label_read(label *lab);

------------------------------------------
 E. exercise
III. Code Generation Demo
 A. context
------------------------------------------
             CONTEXT

VM changes (from HW4's VM):


// words are ints or floats
typedef enum {int_type, float_type} word_type;

// words for this machine
typedef struct word_s {
    word_type type_tag;
    union word_u {
	int i;
	float f;
    } data;
} word;

  No MOD instruction
  Changed to RND instruction (rounding)

FLOAT Language changes

  <stmt> ::= '{' <var-decls>
                 <stmt> { <stmt> } '}'
           | ...

where

  <var-decls> ::= { <var-decl> }
  <var-decl> ::= float <ident>
               | bool <ident>
              
------------------------------------------
        If the FLOAT language only has floats and bools,
           does it make sense to have a round expression?
 B. Enhanced ASTs as an IR
      What changes to the ASTs are needed for new syntax
         and for code generation?
------------------------------------------
    ENHANCED ASTS AS AN IR

See ast.h

Changes to ASTs:

// S ::= begin { VD } { S }
typedef struct {
    AST_list vds;
    AST_list stmts;
} begin_t;

------------------------------------------
  1. identifier uses
------------------------------------------
       IDENTIFIER USES IN ASTs

// E ::= x
typedef struct {
    // name of a constant or variable
    const char *name;
    // set during static analysis,
    // includes info for lexical addr
    id_use *idu;
} ident_t;

// S ::= read x
typedef struct {
    AST *ident;
} read_t;

// S ::= assign x E
typedef struct {
    AST *ident;
    AST *exp;
} assign_t;

------------------------------------------
------------------------------------------
     ID_USE STRUCTURES

typedef struct {
    id_attrs *attrs;
    unsigned int levelsOutward;    
} id_use;

------------------------------------------
        Why track attributes in the AST?
  2. id attributes (id_attrs)
------------------------------------------
           ID ATTRIBUTES

// attributes of idents
typedef struct {
    file_location file_loc;
    var_type vt;  // type
    // offset from beginning of scope
    unsigned int loc_offset;
} id_attrs;

// where:

typedef enum {float_t, bool_t} var_type;

------------------------------------------
 C. scopes and symbol tables
    Where do nested scopes happen in this language?
  1. scope checking begin statements
------------------------------------------
void scope_check_beginStmt(AST *stmt)
{
    symtab_enter_scope();  // <*******
    scope_check_varDecls(
        stmt->data.begin_stmt.vds);
    AST_list stmts
         = stmt->data.begin_stmt.stmts;
    while (!ast_list_is_empty(stmts)) {
	scope_check_stmt(
               ast_list_first(stmts));
	stmts = ast_list_rest(stmts);
    }
    symtab_leave_scope(); // <********
}

------------------------------------------
 D. Writing a Code Generator
------------------------------------------
          GENERATING CODE

Done in the file gen_code.c

  - Functions arranged to walk the ASTs
  - All return a code_seq

 Useful files: ast.h
                 id_attrs.h
                   id_use.h
               code.h
------------------------------------------
        What does a function in gen_code get as an argument?
------------------------------------------
        STEPS FOR CODE GENERATION

 1. start with the base cases


 2. Write simplest tests possible


 3. Design code sequences
     for the nonterminals involved


 4. Write code for each node of the AST


 5. Test it:
     a. check the output machine code

     b. check the VM's execution

------------------------------------------
 E. Examples of generating code
  1. expressions
   a. design
------------------------------------------
              EXPRESSIONS

What are the base cases?


A very simple test:


What code sequence do we want?


------------------------------------------
    And what are the inductive cases?
        What instruction can we use to allocate space for x?
        How should we get the fp for x on top of the stack?
        How should we get the value of 1 on top of the stack
            LIT 1

        So, that is our code sequence:
               INC 1
               PBP 0
               LIT 1
               STO 0
        Q: Anything else we are missing in this code?
        Anything else we are missing in this code?
   b. coding it
------------------------------------------
         GENERATING CODE

Where does execution start?


What is the AST for our program?


Where do we generate those code sequences?


Let's write it!


------------------------------------------
   c. more expression cases
   d. binary expressions
 F. statements
     Any more base cases for statments in FLOAT?
     What are the inductive cases?
    i. begin-statements without var decls
    ii. read and write statements
    iii. if statements
 G. nested scopes
------------------------------------------
            NESTED SCOPES

Example in FLOAT:

# $Id$
float x;
{ float y;
  { float z;
    z = 0;
    y = 1;
    x = 2;
  }
}

What kind of code sequence for this?


------------------------------------------
  What should be done to enter a scope?
  What should be done to exit a scope?