UP | HOME

Machine code generation
Lecture 18

Table of Contents

Code generation overview

  • Recall: compiler takes source language and produces target language
    • Translate, not execute
  • Generates equivalent program in assembly
    • Each language construct has corresponding assembly code patterns

Assembly file layout

  • data
    • Fixed size, global data section (bss section is zeroed out)
  • rodata
    • Immutable data, e.g., for string constants
  • text
    • Executable part

https://wiki.osdev.org/ELF

What about local variables, malloc'ed data?

  • Local variables and heap-allocated variables are stored in memory allocated at runtime
  • Running program (process) works with the OS to be allocated memory as needed at load and runtime

Using GDB

Use gdb to step through your simplec output program. First, install it with

sudo apt install gdb

Clone and install this useful gdb assistant called peda. Make sure you have already compiled your simplec program as shown in "Using your compiler" above. Then step through the program like so:

gdb a.out
set disassembly-flavor att
# once inside of gdb
b main # set a breakpoint at main
run # start the program.  it will wait at main
si # step through each assembly instruction
# continue stepping through to track the behavior

If you've downloaded and installed peda, you will see the assembly code, registers, and stack displayed after each step.

Use n or next instead of si to step over function calls, e.g., the input/output calls.

Function implementation

How do functions work?

  • Caller transfers control to callee function
  • Caller provides input values
  • Callee provides output value(s)
  • Execution resumes in caller once callee is finished

Function calls "freeze" state of caller

  • (Diagram)

How would you implement this with just assembly?

  • Save state on stack
  • Unconditional branch
  • Save return value
  • Another branch to go back to where we left off

"Nested" function calls freeze state of many callees

  • (Diagram)

Can think of recursive functions as invokes a fresh instance of the function, rather than calling itself.

Stack frame (or activiation record)

  • Holds all information needed to "freeze" state of function
    • Parameters and local variables
    • Return address
    • Caller's stack frame (nested calls)

Parameter passing

  • Registers and/or stack
  • Registers are faster, but limited in number
  • May need to save them before making call

Application binary interface (ABI)

  • Calling conventions and stack frame layout
    • How to pass parameters
    • Layout of data in the stack frame
    • How to return values
    • Caller and callee responsbilities
  • Architecture- and OS-dependent

Intel x86-64 support for functions

  • %rbp - base pointer points to the current function's stack frame
  • %rsp - stack pointer points to the top of the stack
  • push/pop - push to and pop from the stack (move data and update %rsp)
  • call - saves next instruction address (%rip) onto stack and branches to function's address
  • ret - pops the caller's next instruction address and branches to it

Recall that "points to" just means that the register holds an address

Writing and calling ABI-compatible functions

  • Function definition
    • Prologue
    • Epilogue
  • Function call
    • Parameter passing
    • Return value
  • (Demo)
# https://github.com/longld/peda
# for C compile with -g for debug symbols
# run gdb
gdb example
b main # break at main
si # step instruction, assembly instructions (intead of code)
info file # get address of rodata
x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes
x/i addr # print as instruction

# dump symtab
objdump -s test

Implementing TAC functions in assembly

Function definitions

  • Emit prologue
    • Save base pointer of caller's stack frame push %rbp
    • Update base pointer to current function's stack frame mov %rsp, %rbp
    • Allocate space on stack for locals and temps sub $96, %rsp
  • Emit epilogue for each function
    • Restore stack pointer to caller's mov %rbp, %rsp
    • Restore the base pointer to the caller's stack frame pop %rbp
    • Return ret
      • Recall that this pops the return address from the stack and branches
  • Emit return instructions
    • Store result of expression in rax per ABI mov -80(%rbp), %rax
    • Goto the epilogue jmp _main_return

Function parameters and calls

  • Move parameters to registers or the stack per the ABI mov -56(%rbp), %rsi
  • Make the call call f
    • call will store the return address (the instruction/program counter plus one instruction) onto the stack

Example function call program

SimpleC

f(x, y) {
  return x + y;
}

main() {
  int x;
  int result;
  input x;
  result = f(x, 100);
  output result;
  return 0;
}

Intermediate code

[f
CONST _t0 1
ASSIGN true _t0
CONST _t1 0
ASSIGN false _t1
ADD _t2 x y
RETURN _t2
, main
CONST _t0 1
ASSIGN true _t0
CONST _t1 0
ASSIGN false _t1
INPUT x
PARAM x
CONST _t2 100
PARAM _t2
CALL _t3 f
ASSIGN result _t3
OUTPUT result
CONST _t4 0
RETURN _t4
]

Assembly code

.text
.globl f
.type f, @function
f:
        push    %rbp
        mov     %rsp, %rbp
        sub     $64, %rsp
        mov     %rdi, -8(%rbp)
        mov     %rsi, -16(%rbp)
        movq    $1, -40(%rbp)
        mov     -40(%rbp), %rax
        mov     %rax, -24(%rbp)
        movq    $0, -48(%rbp)
        mov     -48(%rbp), %rax
        mov     %rax, -32(%rbp)
        mov     -8(%rbp), %rax
        mov     -16(%rbp), %rcx
        add     %rcx, %rax
        mov     %rax, -56(%rbp)
        mov     -56(%rbp), %rax
        jmp     _f_return
_f_return:
        mov     %rbp, %rsp
        pop     %rbp
        ret
.text
.globl main
.type main, @function
main:
        push    %rbp
        mov     %rsp, %rbp
        sub     $80, %rsp
        movq    $1, -40(%rbp)
        mov     -40(%rbp), %rax
        mov     %rax, -8(%rbp)
        movq    $0, -48(%rbp)
        mov     -48(%rbp), %rax
        mov     %rax, -16(%rbp)
        call    input_int64_t@PLT
        mov     %rax, -24(%rbp)
        mov     -24(%rbp), %rdi
        movq    $100, -56(%rbp)
        mov     -56(%rbp), %rsi
        call    f
        mov     %rax, -64(%rbp)
        mov     -64(%rbp), %rax
        mov     %rax, -32(%rbp)
        mov     -32(%rbp), %rdi
        call    output_int64_t@PLT
        movq    $0, -72(%rbp)
        mov     -72(%rbp), %rax
        jmp     _main_return
_main_return:
        mov     %rbp, %rsp
        pop     %rbp
        ret

Compiler project

Implement the rest of the machine code generator for three-address code.

You may use the template code. Please develop and use your own test cases. Ask any questions about details of the assembly code in class or in chat.

To get the repo ready, uncomment the ASMGen phase in the main driver and Makefile:

diff --git a/Compiler.java b/Compiler.java
index 1e88b68..feb9fc0 100644
--- a/Compiler.java
+++ b/Compiler.java
@@ -38,10 +38,10 @@ public class Compiler {
     String outputFile = inputFileNoExt + ".s";
     PrintWriter outfile = new PrintWriter(new FileWriter(outputFile));

-    // // Phase 5: Machine code gen.
-    // ASMGen asmgen = new ASMGen(outfile);
-    // System.err.println(codegen.functionlist);
-    // asmgen.gen(codegen.functionlist);
+    // Phase 5: Machine code gen.
+    ASMGen asmgen = new ASMGen(outfile);
+    System.err.println(codegen.functionlist);
+    asmgen.gen(codegen.functionlist);

     // Cleanup output file.
     outfile.close();
diff --git a/Makefile b/Makefile
index 4b4bdab..9a8c38a 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ SOURCE := \
        TAC.java \
        TACFunction.java \
        CodeGen.java \
-       # ASMGen.java
+       ASMGen.java

 CLASSES := $(SOURCE:%.java=%.class)

To run your compiler's output, assemble and link it with the I/O library, io.c.

java Compiler example.simplec
gcc -o example example.s io.c

You can then run your program with ./example.

Submission

Push the complete code generator to the main branch of your github repository. Be sure that it builds with make from the root directory and can be run with java Compiler program.simplec.

Grading

The compiler will be checked for functional correctness on a suite of example SimpleC programs with know inputs and outputs. For the example program above, for instance, if we have fun.in1

203

which has known output fun.groundtruth1

303

we can test the compiler by checking its output against the known output:

./fun < fun.in1 > fun.out1
diff fun.groundtruth1 fun.out1
echo $?

diff should produce no output and the exit code $? should be zero. If this goes wrong, you might see something like this

$ ./fun < fun.in1 > fun.out1
$ diff fun.groundtruth1 fun.out1
1c1
< 303
---
> 0
$ echo $?
1

Author: Paul Gazzillo

Created: 2022-03-21 Mon 13:50