UP | HOME

Backend overview
Lecture 13

Table of Contents

Backend overview

Compiler architecture refresher

Middle-end: designing an intermediate language

  • The intermediate language bridges the programming language and machine code

Considerations

  • Easier to generate code compared to machine code
  • Easier to generate machine code for compared to source languages
  • Convenient with many source languages
  • Abstracts away common machine features, e.g., registers vs. memory, functions vs. branches
  • Enables common optimizations, e.g., instruction selection (multiplcation vs. shift)

Challenges

  • Preserving debugging information from source language
  • Support multiple language paradigms?
    • OOP, functional, C-like, interpreted, etc.

Example of intermediate languages

Low-Level Virtual Machine (LLVM) Intermediate Representation (IR)

  • Originally geared towards C-like systems languages
  • Popular target for many languages
    • C, C++, Objective-C, Swift, Rust, with compilers for Java, Haskell, etc.
  • Supports many back-ends
    • x86-64, arm, mips, etc.

LLVM IR example

From https://llvm.org/docs/LangRef.html#br-instruction

Test:
  %cond = icmp eq i32 %a, %b
  br i1 %cond, label %IfEqual, label %IfUnequal
IfEqual:
  ret i32 1
IfUnequal:
  ret i32 0

Java Bytecode

  • Originally designed for Java
    • Typically interpreted by a virtual machine (JVM)
    • Some JVMs perform on-the-fly compilation to machine code (HotSpot)
  • Other popular languages targeting the JVM
    • Scala, Clojure, etc.

Java Bytecode example

javap -c TypeChecker.class

public java.lang.Void visitProgram(SimpleCParser$ProgramContext);
  Code:
     0: aload_0
     1: aload_1
     2: invokevirtual #20                 // Method visitChildren:(Lorg/antlr/v4/runtime/tree/RuleNode;)Ljava/lang/Object;
     5: pop
     6: aconst_null
     7: areturn

Note that it has OOP support in the bytecode, e.g., virtual function invocation

Common Intermediate Language (CIL)

  • Microsoft's intermediate language for its .NET framework
  • Target for several languages supported by Microsoft
    • C, C++, C#, VisualBasic, F#

Our intermediate language for SimpleC

  • Machine-like opcodes
    • Arithmetic, conditional branches
    • One operation per instruction
  • Unlimited temp variables (registers)
    • No register management needed
  • Support for function calls

Three-address code (TAC)

  • One "opcode" per instruction
  • A maximum of three addresses (registers) per operation

TAC opcodes and their meaning

enum Op {
  CONST,  // arg1 is the destination; arg2 is a number literal
  ASSIGN, // arg1 is the destination; arg2 is a variable name
  INPUT,  // arg1 is the variable name to input
  OUTPUT, // arg1 is the variable name to output
  ADD,    // arg1 is the destination; arg2 and arg3 are the operands
  SUB,
  MULT,
  DIV,
  PARAM, // arg1 is the parameter name
  CALL,  // arg1 is the destination; arg2 is the function name
  NOP,   // no arguments
  RETURN,// arg1 is the variable name to return
  LABEL, // arg1 is the name of the label
  GOTO,  // arg1 is the name of the label
  // ZE - operand is zero
  // NZ - operand is not zero
  GOTOZE,// arg1 is the name of the label; arg2 is the operand
  GOTONZ,
  // EQ - operands are equal
  // NE - operands are not equal
  // LT | GT | LE | GE - operands are (less than | greater than | less than or equal | greater than or equal)
  GOTOEQ,// arg1 is the name of the label; arg2 and arg3 are the operands
  GOTONE,
  GOTOLT,
  GOTOGT,
  GOTOGE,
  GOTOLE,
}

Learning about and implementing the back-end

  • Understanding the instruction set
  • Defining SimpleC constructs in terms of TAC instructions
  • Implementing code generation as a tree walking algorithm

Author: Paul Gazzillo

Created: 2022-03-14 Mon 14:28