UP | HOME

Compiler Project
Compilers
COP-3402

Table of Contents

Compiler overview

Source to process

Compiler input and output

(Diagram)

Input language (SimpleIR) -> Compiler (CodeGen) -> Output language (x86 assembly)

Example run of the compiler

x := x + 2
# load x and 2
      mov	-16(%rbp), %rax
      mov	$2, %rbx
# perform addition, i.e., rax = rax + rbx
      add	%rbx, %rax
# store result in x
      mov	%rax, -16(%rbp)

SimpleIR

Our intermediate representation.

Example: C program

int main(int argc, char **argv) {
  int x;
  int retval;
  x = read_int();
  retval = print_int(x);
  return 0;
}

Example: SimpleIR

function main
localVariables argc argv x retval
parameters argc argv
x := call read_int
x := x * 2
retval := call print_int x
return 0
end function

Compilation units

  • One file is one compilation
  • One compilation unit defines one function

Minimum SimpleIR

function main
return 0
end function
  • function, return, and end function must appear in each SimpleIR program
  • Function takes a single name
  • Return takes an integer literal or a variable name

Defines a new function called main.

With parameters

function main
localVariables argc argv x y
parameters argc argv
# instructions go here
return 0
end function
  • localVariables declares all variable, including parameters
  • parameters are a subset of localVariables

For instance, in C, parameters to the function are local variables. In SimpleIR you need to declare all local variables first, include all parameters. Then specify separately which local variables are the parameters.

Statements

Assignment

x := 2
y := x
  • Sets the value of a variable to
  • Use the := symbol
  • Assignment to either
    • an to integer literal or
    • another variable name

Arithmetic

x := x + 2
  • A single operation, no arithmetic expressions
  • Always assigns the result to a variable name

Function calls

x := call print_int x
  • Works like a C function call
  • No parentheses
  • List arguments after the function name

Branching

ifgoto x > 0 goto endlabel
x := x * -1
endlabel:
  • No structured code, more like assembly
  • goto for unconditional branches, ifgoto for conditional
  • Define labels as targets of branching

Pointers

t1 := &x
t2 := *t1
*t1 := 11
  • Ampersand gets address
  • t2 := *t1 dereferences t1 and gets its value
  • *t1 dereferences t1 and assigns its value

Example program: exponents

function exponent
localVariables base exp result
parameters base exp
result := 1
top:
if exp <= 0 goto endlabel
result := result * base
exp := exp - 1
goto top
endlabel:
return result
end function

SimpleIR Grammar

grammar SimpleIR;

unit: function;

function: 'function' functionName=NAME localVariables? parameters? statement* returnStatement end;

localVariables: 'localVariables' variables=NAME+;

parameters: 'parameters' formals=NAME+;

returnStatement: 'return' operand=(NAME | NUM);

end: 'end' 'function';

statement: assign | dereference | reference | assignDereference | operation | call | label | gotoStatement | ifGoto;

operation: variable=NAME ':=' operand1=(NAME | NUM) operatorKind=('+' | '-' | '*' | '/' | '%') operand2=(NAME | NUM);

assign: variable=NAME ':=' operand=(NAME | NUM);

dereference: variable=NAME ':=' '*' operand=NAME;

reference: variable=NAME ':=' '&' operand=NAME;

assignDereference: '*' variable=NAME ':=' operand=(NAME | NUM);

call: variable=NAME ':=' 'call' functionName=NAME actuals=NAME*;

label: labelName=NAME ':';

gotoStatement: 'goto' labelName=NAME;

ifGoto: 'if' operand1=(NAME | NUM) operatorKind=('=' | '!=' | '<' | '<=' | '>' | '>=') operand2=(NAME | NUM) 'goto' labelName=NAME;

NAME: [a-zA-Z_] ([a-zA-Z_] | [0-9])* ;
NUM: [0-9]+ ;

PLUS: '+' ;
MINUS: '-' ;
STAR: '*' ;
SLASH: '/' ;
PERCENT: '%' ;

EQ: '=' ;
NEQ: '!=' ;
LT: '<' ;
LTE: '<=' ;
GT: '>' ;
GTE: '>=' ;

WS:   [ \t\r\n]+ -> skip ;

COMMENT : '#' ~[\r\n]* -> skip ;

Implementation

Semantic actions

ANTLR

http://lab.antlr.org

  • Parsing framework
  • Define grammar
  • Write semantic actions

Listener model

Use some SimpleIR, e.g., assignment

Show syntax tree

Show translation specification

Finally show actual C++ code

Intel x86-64 assembly

Assembly file layout

  • data
    • Fixed size, global data section (bss section is zeroed out)
  • rodata
    • Immutable data, e.g., for string constants
  • text
    • Executable part

https://wiki.osdev.org/ELF

  • Assembly file converted to an exectuable
  • Loader copies executable segments into memory

x86 registers

Register name Description
%rax General-purpose (A for accumulator)
%rbx General-purpose (B for base)
%rdx General-purpose (D for data)
%rsp Stack pointer (SP), top of stack
%rbp Base pointer (BP), local variable access

https://wiki.osdev.org/CPU_Registers_x86

These are the only registers we'll need to know about for this class.

%rax is AT&T syntax, which is the default syntax used by gcc.

x86 operations

Variables, constants, pointers

Operation Description
mov Move data to/from registers and memory

Arithmetic

Operation Description
add Integer addition
sub Integer subtraction
imul Integer multiplication
idiv Integer division
cdq Convert double-word to quad-word (for division)

Control-flow

Operation Description
jmp Jump to a given instruction address
cmp Compare (used for conditional jumps)
je Jump if cmp was equal
jne Jump if cmp was not equal
jl Jump if cmp was less than
jle Jump if cmp was less than or equal
jg Jump if cmp was greater than
jge Jump if cmp was greater than or equal

Functions

Operation Description
push Push onto the stck
pop Pop from the stack
call Push the return address and jump (for functions)
ret Pop the return address and jump to it (for functions)

https://www.felixcloutier.com/x86/

These are the only ops we'll use in this class.

We will go over them in more detail for the parts of the SimpleIR that can use them.

x86 operand ordering

Destination operand is last, even for arithmetic operations.

Operation C-like syntax
mov %rbx, %rax rax = rbx
add %rbx, %rax rax = rax + rbx

This is AT&T syntax, used by gcc.

Example SimpleIR

function main
localVariables x y
x := 3
y := 5
return 0
end function

Local variable layout

(Diagram)

Diagram

  • Illustrate offsets and register for x and y
  • Point out that the local variables are basically in an array
  • Show how the %rbp register points to the first element (%rbp is like the array's name in C)
  • We start in the second slot, because other info used by the function will be in the first and second ones

eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

Symbol table

Record the memory offset of the variable

localVariables x y
Variable Offset
x -16
y -24

Variable assignment

x := 3
# assign 3 to x
mov	$3, %rax
mov	%rax, -16(%rbp)

Array accesses

-16(%rbp)   # rbp[-16]

Reminder about the array access

Diagram

  • Show each part of the operand
  • Relate it to a C array index

Assembly

        .file "stdin"
        .section .note.GNU-stack,"",@progbits
        .text
        .globl main
        .type main, @function
main:
        # prologue, update stack pointer
        pushq	%rbp # save old base ponter
        movq	%rsp, %rbp # set new base pointer
        push	%rbx # %rbx is callee-saved
        # allocate stack space for locals
        sub	$24, %rsp
        # assign 3 to x
        mov	$3, %rax
        mov	%rax, -16(%rbp)
        # assign 5 to y
        mov	$5, %rax
        mov	%rax, -24(%rbp)
        # set return value
        mov	$0, %rax
        # epilogue
        add	$24, %rsp
        pop	%rbx # restore %rbx
        pop	%rbp # restore old base pointer
        ret

Author: Paul Gazzillo

Created: 2025-10-23 Thu 00:44

Validate