I. Overview of Static Analysis A. Goals for Static Analysis for a Compiler ------------------------------------------ GOALS FOR STATIC ANALYSIS - Protect code generator from extra cases - Catch problems, to improve programmer productivity - Gather information to improve programs ------------------------------------------ What does static mean? Where would type checking fit in here? 1. Kinds of problems to catch: ------------------------------------------ PROBLEMS TO CATCH ------------------------------------------ How could these problems help the programmer? II. architecture A. Parser, Symbol Table, and Code Generation 1. tasks needed ------------------------------------------ WHAT A COMPILER NEEDS TO DO - static analysis e.g., type checking needs: - each name's type - code generation needs: - each var or const name's lexical addr - each procedure's starting address ------------------------------------------ Must we always type check statically? What information does the VM need to load and store from variables? What information does the VM need for procedure calls address of where the code for the procedure starts 2. data structures needed a. symbol table ------------------------------------------ SYMBOL TABLE Gives information about each name: def: a *symbol table* maps ------------------------------------------ ... - kind of name (const, var, procedure) - type (or at least size) - offset in its AR ... each (declared) name to its attributes. (An attribute is a property important to the compiler, such as kind of name, type, or size) Q: What kind of data structure would work to store this information? What kind of data structure would work to store this information? b. storage layout ------------------------------------------ STORAGE LAYOUT Compiler needs to track: memory layout within a scope: - what offset can store the next variable or constant? instruction layout: - what address to jump to? - for if-statements, while-loops ------------------------------------------ What kind of data structure could be used to store this layout? How would an if-then-else be processed in a VM? When generating code for an if-then-else, how can the compiler know the target addresses to jump to? c. summary ------------------------------------------ SUMMARY OF DATA NEEDS Symbol Table: maps names -> attributes (kind, size, etc.) Code Manager: - data allocated in a frame - instruction counts for pieces of code ------------------------------------------ For the symbol table, what operations are needed? For the code manager, what operations are needed? B. Basic architectural issues for static analysis and code generation ------------------------------------------ HOW SHOULD PARSER COMMUNICATE? Strategies: [Action-based] During parsing: - for nonterminal , action: - adds to symbol table - checks for errors - allocates and tracks storage - emits code [Tree-based] During parsing: - for nonterminal , action: - creates and returns an AST Walk tree to: - build symbol table - check for errors - allocate and track storage - (improve the tree's computation e.g., eliminate unneeded computation steps) - emit code ------------------------------------------ If we use a tree-based approach, then when and how does the symbol table get built? ------------------------------------------ ADVANTAGES AND DISADVANTAGES Action-based architecture: Tree-based: ------------------------------------------ III. ASTs A. What is an AST ------------------------------------------ ABSTRACT SYNTAX TREES (ASTs) def: an *abstract syntax tree* or AST is ------------------------------------------ ------------------------------------------ EXAMPLE AST AST for 3*4+2: ------------------------------------------ How does this compare to the parse tree for this we had earlier? B. Designing ASTs ------------------------------------------ DESIGNING ASTS Want to: - Represent all structure in programs - avoid unnecessary levels How to design them? ------------------------------------------ 1. abstract syntax ------------------------------------------ ABSTRACT SYNTAX A language's *abstract syntax* is a grammar that generates the same language and Allows: - ambiguity but think of it as ------------------------------------------ 2. example ------------------------------------------ EXAMPLE ABSTRACT SYNTAX FOR PL/0 B ::= CDs VDs PDs S CDs ::= { CD } CD ::= const CDefs CDefs ::= { CDef } CDef ::= x n VDs ::= { VD } VD ::= var xs xs ::= { x } PDs ::= { PD } PD ::= procedure x B S ::= AS | BS | IfS | WhS | RS | WS | SkS AS ::= assign x E BS ::= begin Ss Ss ::= { S } IfS ::= if C S1 S2 WhS ::= while C S RS ::= read x WS ::= write E SkS ::= skip C ::= OddC | ROC OddC ::= odd E ROC ::= E1 r E2 r ::= = | <> | < | <= | > | >= E ::= BOE | x | n BOE ::= E o E o ::= + | - | * | / where x in n in { X } means 0 or more X's ------------------------------------------ What happened to negation of numbers as expressions? C. Representing ASTs 1. In an OO language ------------------------------------------ ASTs IN AN OO LANGUAGE (C++/JAVA) For each rule of the form: X ::= kw1 A B | kw2 C D Use: an abstract class, named "X" subclasses of X: - kw1 with fields of types A and B - kw2 with fields of types C and D //Example (in Java): public abstract class AST { string filename; int line; } public abstract class Stmt extends AST {} public class Assign extends Stmt { Ident x; Expr e; public Assign(Ident x, Expr e) { this.x = x; this.e = e; } } public class If extends Stmt { Cond c; Stmt s1; Stmt s2; public If(Cond c, Stmt s1, Stmt s2) { this.c = c; this.s1 = s1; this.s2 = s2; } } // ... ------------------------------------------ 2. In C a. type tags (AST_type) ------------------------------------------ EXAMPLE IN C // file ast.h #include "file_location.h" // types of ASTs (type tags) typedef enum { block_ast, const_decls_ast, var_decls_ast, /* ... */ number_ast, ident_ast, empty_ast } AST_type; // common AST structure typedef struct { file_location *file_loc; AST_type type_tag; void *next; } generic_t; // ... continued... ------------------------------------------ b. struct types for each kind of nonterminal i. structs for declarations and statements ------------------------------------------ STRUCTS FOR EACH KIND OF NONTERMINAL #include "file_location.h" // B ::= CDs VDs PDs S typedef struct block_s { file_location *file_loc; AST_type type_tag; const_decls_t const_decls; var_decls_t var_decls; proc_decls_t proc_decls; stmt_t stmt; } block_t; // CDs ::= { CD } typedef struct { file_location *file_loc; AST_type type_tag; const_decl_t *const_decls; } const_decls_t; // VD ::= var x typedef struct { const char *name; } var_decl_t; // CD ::= const CDefs typedef struct const_decl_s { file_location *file_loc; AST_type type_tag; struct const_decl_s *next; const_defs_t const_defs; } const_decl_t; // CDefs ::= { CDef } typedef struct { file_location *file_loc; AST_type type_tag; const_def_t *const_defs; } const_defs_t; // CDef ::= ident number typedef struct const_def_s { file_location *file_loc; AST_type type_tag; struct const_def_s *next; ident_t ident; number_t number; } const_def_t; /* ... */ // forward declaration for expressions struct expr_s; /* ... */ // AS ::= x E typedef struct { file_location *file_loc; AST_type type_tag; const char *name; struct expr_s *expr; } assign_stmt_t; // BS ::= Ss typedef struct { file_location *file_loc; AST_type type_tag; stmts_t stmts; } begin_stmt_t; // IfS ::= if C S1 S2 typedef struct { file_location *file_loc; AST_type type_tag; condition_t condition; struct stmt_s *then_stmt; struct stmt_s *else_stmt; } if_stmt_t; /* ... */ // empty ::= typedef struct { file_location *file_loc; AST_type type_tag; } empty_t; // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // (possibly signed) numbers typedef struct { file_location *file_loc; AST_type type_tag; const char *text; word_type value; } number_t; // tokens as ASTs typedef struct { file_location *file_loc; AST_type type_tag; const char *text; int code; } token_t; ------------------------------------------ ii. structs for conditions and expressions ------------------------------------------ STRUCTS FOR CONDITIONS AND EXPRESSIONS // kinds of conditions typedef enum { ck_odd, ck_rel } condition_kind_e; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr; } odd_condition_t; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr1; token_t rel_op; expr_t expr2; } rel_op_condition_t; // kinds of expressions typedef enum { expr_bin, expr_ident, expr_number } expr_kind_e; // forward declaration for expressions struct expr_s; // BOE ::= E1 o E2 // o ::= + | - | * | / typedef struct { file_location *file_loc; AST_type type_tag; struct expr_s *expr1; token_t arith_op; struct expr_s *expr2; } binary_op_expr_t; // E ::= BOE | x | n typedef struct expr_s { file_location *file_loc; AST_type type_tag; expr_kind_e expr_kind; union { binary_op_expr_t binary; ident_t ident; number_t number; } data; } expr_t; ------------------------------------------ c. the AST type as a tagged union ------------------------------------------ THE AST TYPE // file ast.h // The AST definition used by bison typedef union AST_u { generic_t generic; block_t block; const_decls_t const_decls; const_decl_t const_decl; const_defs_t const_defs; const_def_t const_def; var_decls_t var_decls; var_decl_t var_decl; idents_t idents; proc_decls_t proc_decls; proc_decl_t proc_decl; stmt_t stmt; assign_stmt_t assign_stmt; call_stmt_t call_stmt; begin_stmt_t begin_stmt; if_stmt_t if_stmt; while_stmt_t while_stmt; read_stmt_t read_stmt; write_stmt_t write_stmt; skip_stmt_t skip_stmt; stmts_t stmts; condition_t condition; rel_op_condition_t rel_op_condition; odd_condition_t odd_condition; expr_t expr; binary_op_expr_t binary_op_expr; token_t token; number_t number; ident_t ident; empty_t empty; } AST; ------------------------------------------ ------------------------------------------ CORRESPONDENCE BETWEEN TAGS AND UNION Whenever the type_tag field is X_ast then the data field's type is X_t and the union's subfield is named X ------------------------------------------ d. AST list type ------------------------------------------ LINKED LISTS OF ASTS // file ast.h // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // ... // idents ::= { ident } typedef struct { file_location *file_loc; AST_type type_tag; ident_t *idents; } idents_t; ------------------------------------------ D. Building ASTs in a parser 1. example: following the grammar, like parseIfStmt ------------------------------------------ BUILDING ASTS USING RULES IN BISON /* ::= if then else */ ------------------------------------------ What would the rule in pl0.y look like for an if-statement? What are the types of the arguments to ast_if_stmt? What is the return type of ast_if_stmt? Where should the file location of the AST come from? 2. example: alternatives, like parseStmt ------------------------------------------ EXAMPLE WITH ALTERNATIVES: STATEMENTS stmt : assignStmt | callStmt | beginStmt | ifStmt | whileStmt | readStmt | writeStmt | skipStmt ; ------------------------------------------ What are the types of the ASTs on the right sides of these rules? What needs to be done to build the AST for the stmt itself? 3. example: loops, like parseBeginStmt ------------------------------------------ EXAMPLE WITH A LOOP: STMTS IN A BEGIN // ::= {; } stmts : stmt | stmts ";" stmt { $$ = ast_stmts($1,$3); } ; ------------------------------------------ How would the ASTs be constructed?