I. Supporting Subroutines in the ISA What is a subroutine? A. feature design to support subroutines ------------------------------------------ GOALS FOR SUBROUTINES subroutine = function or procedure (abstraction of expressions or commands) Want: ------------------------------------------ ------------------------------------------ WHAT A SUBROUTINE CALL DOES What are the steps in a subroutine call like: x = f(E1,E2) where E1 and E2 are expressions? ------------------------------------------ What happens in the machine to pass arguments? 1. SRM, a simplified RISC machine ------------------------------------------ SRM = SIMPLIFIED RISC MACHINE RISC = Reduced Instruction Set Computer (vs. CISC = Complex Instruction Set ...) SRM, for HW, based on MIPS processor Register-based instructions: ADD s,t,d is GPR[d] <- GPR[s] + GPR[t] ADDI s,t,i is GPR[t] <- GPR[s] + i Byte-addressible: LBU b,t,o is GPR[t] <- memory[GPR[b]+o] ------------------------------------------ Is the x86 a RISC or CISC design? ------------------------------------------ 32 REGISTERS IN THE SRM Num Notes 0 always 0 (can't write this!) 1 assembler's temporary 2 function result ($v0) 3 function result ($v1) 4 function argument ($a0) 5 function argument ($a1) 6 function argument ($a2) 7 function argument ($a3) 8-15 temporary ($t0,...,$t7) 16-23 temporary ($s0,...) 24-25 temporary ($t8, $t9) 26 (reserved for OS) 27 (reserved for OS) 28 global, static data ($gp) 29 stack pointer ($sp) 30 frame pointer ($fp) 31 return address ($ra) ------------------------------------------ ------------------------------------------ CALLS AND REGISTERS Call done by "Jump and Link" instruction LW $a0, x # load argument x LW $a1, y # load argument y JAL f $ra <- PC # save return addr PC <- f # jump to f ... JR $ra # jump to return addr ------------------------------------------ What complications arise for calls in a VM with registers? ------------------------------------------ WHO SAVES AND RESTORES REGISTERS? Problem: limited number of registers - caller might want some preserved - callee might change some So if registers are saved by: Callee, then might need to save Caller, then might need to save So, use a ------------------------------------------ ------------------------------------------ CALLING CONVENTION Agreement between all callers and callees Callee saves & restores: Caller saves & restores: ------------------------------------------ ------------------------------------------ IT'S A CONVENTION / AGREEMENT Saving and restoring not enforced ------------------------------------------ ------------------------------------------ VM FEATURES TO SUPPORT SUBROUTINES ------------------------------------------ B. design for statically-scoped subroutines 1. what is static scoping? ------------------------------------------ STATIC SCOPING def: In *static scoping*, each identifier x def: In *dynamic scoping*, each identifier x ------------------------------------------ Is there another way that identifiers could be found in a program? What kind of scoping is in C, C++, and Java? The Unix shell? 2. motivation for static scoping ------------------------------------------ MOTIVATION FOR STATIC SCOPING int incr = 1; int addOne(int y) { return y+incr; } int client() { int incr = 2; int z = addOne(3); // what is the value of z here? return z; } ------------------------------------------ What should addOne do? What should the value of z be? Do we want to be able to check programs when we write them? 3. block structure ------------------------------------------ BLOCK STRUCTURE def: A *block* is Usual Grammar: Example in C { int next = 3*x+1; next = next / 2; return next; } ------------------------------------------ ------------------------------------------ ADVANTAGES OF BLOCK STRUCTURE - Local storage - Control of names - Easier to extract procedures ------------------------------------------ 4. motivation for recursion Do you think recursion is hard to use and understand? ------------------------------------------ RECURSIVE DATA ==> RECURSIVE PROGRAMS A good rule of design: // file btree.h #include "Tdef.h" typedef struct treeNode { T value; struct treeNode *left, *right; } tree; // helper to compute maximum of its args int max(int a, int b) { return (a >= b) ? a : b; } ------------------------------------------ How should we write a program to find the depth of a tree? Is that better than using while loops and an explicit stack? ------------------------------------------ RECURSIVE GRAMMARS Grammar for statements: ::= ... | while () Structure of a parser: // typedef /* ... */ stmtTree; stmtTree *parseStatement() { /* ... */ parseWhileLoop(); /* ... */ } stmtTree *parseWhileLoop() { /* ... */ parseStatement(); /* ... */ } ------------------------------------------ Is it easy to follow what these routines are doing? Why are natural languages structured recursively? 5. design for subroutines ------------------------------------------ VM DESIGN FOR SUBROUTINES For each call: - storage for a subroutine's variables (local storage) organized as - storage for a single call is called an AR: def: An *activation record* (AR) ------------------------------------------ Why use a stack? ------------------------------------------ STACK ORGANIZATION In the code: P calls Q, Q calls R Initially: [ 0 ] Call of subroutine P: [ AR for P ] After P calls Q: [ AR for P ] [ AR for Q ] After Q calls R: [ AR for P ] [ AR for Q ] [ AR for R ] After R returns: [ AR for P ] [ AR for Q ] After Q returns: [ AR for P ] After P returns: ------------------------------------------ 6. stack implementation Since a computer's memory is like a big (1D) array, how should we implement the runtime stack? ------------------------------------------ STACK IMPLEMENTATION AR delimited by two indexes: - fp: - sp: Notes, assuming stack is word-addressed, and grows towards lower addresses (down) ------------------------------------------ Does the storage for a subroutine vary dynamically? ------------------------------------------ STACK IMPLEMENTATION: HEADER FILES Assume: - stack is byte addressed - stack grows down towards lower addresses // File: machine_types.h #ifndef _MACHINE_TYPES_H #define _MACHINE_TYPES_H // type of addresses typedef unsigned int address_type; // type of machine bytes typedef unsigned char byte_type; // type of machine words typedef int word_type; #define BYTES_PER_WORD 4 #endif // File: stack.h #ifndef _STACK_H #define _STACK_H #include #include #include "machine_types.h" // The MAX_STACK_HEIGHT must be // evenly divisible by BYTES_PER_WORD #define MAX_STACK_HEIGHT 2048 // Initialize the stack data structure extern void stack_initialize(); /* ... other extern declarations ... */ #endif ------------------------------------------ ------------------------------------------ STACK IMPLEMENTATION: STACK.C FILE /* $Id: stack.c,v 1.3 2023/09/08 ... */ #include #include #include #include "utilities.h" #include "stack.h" // size of the stack in words #define STACK_LEN (MAX_STACK_HEIGHT / BYTES_PER_WORD) // maximum index of a byte in the stack. // Note: this equals BYTES_PER_WORD * (STACK_LEN - 1) #define MAX_BYTE_INDEX (MAX_STACK_HEIGHT - BYTES_PER_WORD) // the stack's storage static word_type memory[STACK_LEN]; // first index of current AR, in bytes static int fp; // index of top element in current AR, in bytes static int sp; // the stack's invariant void stack_okay() { } // Initialize the stack data structure void stack_initialize() { } // Return the stack's total size, in bytes int stack_size() { } // Return the current AR's num. of bytes int stack_AR_size() { return fp - sp; } // Return the address of the base // of the current AR (fp value) address_type stack_AR_base() { } // Return the address of the top word // element in the current AR (sp value), // as a byte address address_type stack_top_AR_address() { } // Return the address of the top word // element in the current AR (sp value), // as a word address address_type stack_top_AR_word_addr() { } // Is the stack empty? bool stack_empty() { } // Is the stack full? bool stack_full() { } // Requires: BYTES_PER_WORD > j >= 0; // get the jth byte of the word v // (numbered from the right) static byte_type fetchByteFromWord( word_type v, int j) { return (v >> (j*8)) & 0x000000FF; } // Requires: !stack_full() // push a word on the stack // with sp becoming old(sp) - BYTES_PER_WORD void stack_push_word(word_type val) { } // Requires: n is evenly divisible // by BYTES_PER_WORD // Requires: (stack_size() + n) // < MAX_STACK_HEIGHT // Increase the size of the stack by n void stack_allocate_bytes(unsigned int n) { } // Requires: !stack_empty() // pop the stack and return the top word. // The size of the stack is // reduced by BYTES_PER_WORD. word_type stack_pop_word() { } // Requires: n is evenly divisible // by BYTES_PER_WORD // Requires: (stack_size() - n) >= 0 // Decrease the size of the stack by n bytes void stack_deallocate_bytes(unsigned int n) { } // Requires: !stack_empty() // pop the stack and return the top word. // The size of the stack is // reduced by BYTES_PER_WORD. word_type stack_pop_word() { } // Requires: !stack_empty() // return the top word without popping word_type stack_top_word() { } // translate a byte address to // a word address (put in word_addr) // and a byte offset (put in byte_offset) void stack_byte2word_address( int addr, int *word_addr, int *byte_offset) { } // Requires: BYTES_PER_WORD > j >= 0; // set the jth byte of the word v (numbered from the right) // to the given value bv static word_type setByteInWord(word_type x, int j, byte_type bv) { } // Requires: stack_top_AR_word_addr() // <= word_addr // Requires: word_addr < STACK_LEN // Requires: 0 <= byte_offset // Requires: byte_offset < BYTES_PER_WORD // set the byte_offset'th byte of // the stack's storage at word_addr to bv void stack_set_byte_at_word_offset( int word_addr, int byte_offset, byte_type bv) { } // Requires: stack_top_AR_address() // - BYTES_PER_WORD <= addr // Requires: addr < STACK_MAX_HEIGHT; // Set the byte at addr to bv void stack_set_byte(address_type addr, byte_type bv) { } // Requires: 0 <= word_addr // Requires: word_addr < STACK_LEN // Requires: 0 <= byte_offset // Requires: byte_offset < BYTES_PER_WORD // Return the byte_offset'th byte of // the stack's storage at word_addr byte_type stack_get_byte_at_word_offset( int word_addr, int byte_offset) { } // Requires: sp - BYTES_PER_WORD <= addr // Requires: addr < STACK_MAX_HEIGHT; // Return the byte at addr byte_type stack_get_byte(address_type addr) { } ------------------------------------------ C. Addressing for nested routines 1. Problem of addressing locals ------------------------------------------ HOW TO ADDRESS LOCAL VARIABLES? procedure p0(); var int x; procedure p1(); var int y; var int z; procedure p2(); var int a; begin a := a+x*y+z; end; begin # body of p1 # ... call p2; # ... call p0 # ... end; begin # body of p0 # ... call p1; # ... call p0 # ... end. ------------------------------------------ How many calls to p0 and p1 will be on the stack when the return from p2 is evaluated? At the return, how can the compiler find the locations of x and y? ------------------------------------------ THE PROBLEM Programming language features - subroutines - nesting of subroutines (Pascal, JS) - static scoping ==> absolute address of variables is hard to predict ------------------------------------------ Can we tell from the text of a program where a routine's ARs will be on the runtime stack? If we don't know where the AR will be on the runtime stack how can local variables in an AR be addressed? 2. Compiler response to solve the problem ------------------------------------------ COMPILER-BASED SOLUTION What can compiler know statically about local variable locations? What would we need to find exact location of a local variable? When can an AR be created that needs to know the base of the surrounding AR? Could we pass the base of the AR for the surrounding scope in a call? If each AR stores a (static) link to the AR of the surrounding scope, how can we address two layers out? 3? What information is needed to address a local variable in a surrounding scope? ------------------------------------------ Are all calls like that? a. Summary: two-part addresses ------------------------------------------ SUMMARY Compilers use two-part addresses, called *lexical addresses* that consist of: ------------------------------------------ ------------------------------------------ HOW TO ADDRESS LOCAL VARIABLES? procedure p0; var x; procedure f1; var y; var z; procedure p2; var a; procedure f3; begin # body of f3 # here --v call f1; call f3; x := a+(x*y)+z end; call f3; # body of p2 call p2; # body of p1 call f1; # body of f1 call p0. #body of p0 ------------------------------------------ Following the comment, what is the lexical address of x? What is the lexical address of z? What is the lexical address of a? 3. What is stored in an AR ------------------------------------------ INFORMATION STORED IN ARs What information is needed in an AR? ------------------------------------------ 4. VM stack operations for call/return ------------------------------------------ WORKING WITH ARs // Requires: the stack has enough room // to allocate a new AR void stack_call_prep( address_type static_link) { } // Requires: restored fp and sp values // satisfy stack's invariant // return given value from a subroutine extern void stack_restore_for_return() { } ------------------------------------------ What is sp now? Does the callee really need to save all of this? a. pictures ------------------------------------------ PICTURE OF CALL OPERATION Execution of stack_call(SL): PC: 16 fP: 464 sp: 448 468 | | 464 | | <- fp 460 | | 456 | | 452 | | 448 | arg5 | <- sp 444 | | 440 | | 436 | | 432 | | 428 | | 424 | | 420 | | 416 | | 412 | | 408 | | 404 | | 400 | | 396 | | 392 | | 388 | | ------------------------------------------ ------------------------------------------ PICTURE OF RETURN OPERATION PC: 200 fp: 444 sp: 384 468 | | 464 | | 460 | | 456 | | 452 | | 448 | arg5 | 444 | saved a0 | <- fp 440 | saved a1 | 436 | saved a2 | 432 | saved a3 | 428 | saved s0 | 424 | saved s1 | 420 | saved s2 | 416 | saved s3 | 412 | saved s4 | 408 | saved s5 | 404 | saved s6 | 400 | saved s7 | 396 | saved s7 | 392 | old static link | 388 | old fp | 384 | old ra | <- sp 380 | | | | ------------------------------------------ b. determining the static link to pass ------------------------------------------ WHAT STATIC LINK TO PASS? If routine R calls E, what static link is passed? ------------------------------------------ II. Other VM designs A. Designs seen previously ------------------------------------------ VM DESIGNS PREVIOUSLY SEEN 1-register machine == Tiny VM SRM (3 registers) == VM of Homework 1 ------------------------------------------ B. Simplified RISC Machine (SRM) ------------------------------------------ SIMPLIFIED RISC MACHINE (SRM) 32 registers Num. Usage Name(s) ===================================== 0 always 0 $0 1 assembler temp. $at 2,3 function results $v0, $v1 4-7 function args $a0 - $a3 8-15 temporaries $t0 - $t7 16-23 saved temporaries $s0 - $s7 24,25 temporaries $t8, $t9 26,27 reserved for OS 28 globals pointer $gp 29 stack pointer $sp 30 frame pointer $fp 31 return address $ra ------------------------------------------ ------------------------------------------ INSTRUCTIONS All fit in one word (32 bits) Register format: op:6 rs:5 rt:5, rd:5, shift:5 func:6 Immediate format: op:6 rs:5 rt:5 immed:16 Jump type: op:6 addr:26 Register Format Arithmetic and Logic (3 registers): 3 registers: ADD, SUB AND, BOR, NOR, XOR 2 registers: MUL, DIV 1 register: MFHI, MFLO Shifts (2 registers + shift): SLL, SRL Jump to register (1 register): JR Immediate Format Arithmetic (2 registers & immediate): ADDI Logical (2 registers & immediate): ANDI, BOI, XORI Comparison & jump (2 registers & offset): BEQ, BNE Comparisons to 0 (1 register & offset): BGEZ, BGTZ, BLEZ, BLTZ Load, Store name 1 register: Call, Return use static links ------------------------------------------