I. Overview ------------------------------------------ SYSTEMS SOFTWARE Helping people run programs Human -----> [ Translator ] Readable / | Input / | (Textual) / | / v v Object Library Code (Binary) (Binary) \ | \ | \ v \->[ Linker/ Loader ] | | v Process | | v [ OS + Computer ] | | v Computation ------------------------------------------ How would a debugger fit into this picture? What job or jobs does a translator have? A. Compiler vs. Interpreter ------------------------------------------ RECALL DEFINITIONS def: An *assembler* translates a low-level language that is close to machine code into machine code. def: A *compiler* translates a (high-level) language into a form that can be (more easily) executed by a computer. def: An *interpreter* executes a programming language directly, often without translating it into object code first ------------------------------------------ What are some examples of languages with interpreters? ------------------------------------------ COMPILER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v Object Code | v [ Linker/ Loader ] | v Process | v [ OS + Computer ] ------------------------------------------ ------------------------------------------ INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Interpreter ] | v [ OS + Computer ] ------------------------------------------ 1. advantages and disadvantages ------------------------------------------ ADVANTAGES AND DISADVANTAGES Compiler Advantages: Disadvantages: Interpreters Advantages: Disadvantages: ------------------------------------------ 2. hybrids of compilers and interpreters ------------------------------------------ HYBRID COMPILER/INTERPRETER PICTURE Source -----> [ Lexical Analyzer ] | v Token stream | v [ Parser ] | v Parse Tree (AST) + Symbol Table | v [ Static Analysis ] | v Parse Tree (AST) + Symbol Table | v [ Code Generator ] | v VM Code | v [ VM + JIT Compiler ] | v [ OS + Computer ] ------------------------------------------ ------------------------------------------ HYBRID ADVANTAGES AND DISADVANTAGES Advantages: Disadvantages: ------------------------------------------ B. compiler structure ------------------------------------------ STANDARD COMPILER ARCHITECTURE Source code (text) | v [ Lexer ] | v Stream of tokens | v [ Parser ] | v AST + Symbol Table | / | v v | [ Static Analyzer ] / | / v / Intermediate Rep. / | / v v [ Code Generator ] | v Instruction Rep. | v [ Optimizer ] | v Machine Code ------------------------------------------ 1. tokens a. definitions ------------------------------------------ TOKENS Represent distinct symbols in the input including punctuation and operators Typically, comments are ignored Reserved words: Keywords: White space delimits identifiers/ aNumber vs. a Number ------------------------------------------ b. data structures i. token types ------------------------------------------ TOKENS Defined in *.tab.h file produced by bison Example from the SRM assembler (asm.tab.h) #ifndef YYTOKENTYPE # define YYTOKENTYPE enum yytokentype { YYEMPTY = -2, YYEOF = 0, /* "end of file" */ YYerror = 256, /* error */ YYUNDEF = 257, /* "invalid token" */ eolsym = 258, /* eolsym */ identsym = 259, /* identsym */ unsignednumsym = 260, // unsignednumsym plussym = 261, /* "+" */ minussym = 262, /* "-" */ commasym = 263, /* "," */ /* ... */ straopsym = 303, /* "STRA" */ notropsym = 304, /* "NOTR" */ regsym = 305, /* regsym */ wordsym = 306 /* "WORD" */ }; typedef enum yytokentype yytoken_kind_t; #endif Examples: Input Token (number) ident identsym 34 unsignednumbersym + plussym ------------------------------------------ What would be the token types for the input WORD x = +24 wordsym identsym equalsym plussym unsignednumsym Q: How are reserved words represented? How are reserved words represented? 2. symbols and symbol table What part of a compiler should populate the symbol table? Why should the that be the tool? a. symbols ------------------------------------------ SYMBOLS What information should remembered for identifiers? ------------------------------------------ How should we track lexical levels? What would be a suitable data structure for a symbol? b. symbol table ------------------------------------------ SYMBOL TABLE A *symbol table* Does each scope have its own symbol table? What operations would a symbol table need? What data structure would be good? ------------------------------------------