Type checking
Lecture 11
Table of Contents
- Informal type specification
- Compile-time vs. run-time behavior
- Choose your own adventure
- Typing of expressions
- Formalizing the type specification
- Type checker preparation
- Implementing type collection
- Implementing type checking
- Putting it all together
- Type-checking is a special kind of evaluator
- Type-checking is a proof of type correctness
- Compiler project
Informal type specification
Declarations
- Restrict symbols to a specific type for whole program
- Static typing
- No change to type at run-time
- Can determine at compile-time
f() { } main() { int y; y = 2; // okay, because y and 2 are both int y = f; // not okay, because y is an int and f is a function }
What is a type?
How can we remember the types of symbols?
Scoping
- Symbols valid within a user-defined range
- Corresponds to functions (or compound statements) in our grammar
- Called lexical or static scoping
- No change to scoping at run-time
- Can determine scopes at compile-time
f() { bool y; y = true; // okay, because y is a bool in f's scope } main() { int y; y = true; // not okay, because this is a different "y" }
What other scoping choices could we make?
Can we have the same symbol with different types somewhere in the program?
What happens with nested scopes?
How can we remember what symbol belongs to what scope at compile-time?
The symbol table
- Maintains mapping from symbols to types
- Maintains scoping information
- Namespacing
- Can we have functions, variables, structs, etc. of same name?
Example
- (In-class diagram)
Look at ASTs for the above examples
Maintain symbol table and scope stack
Cases
- One declaration and one usage (also show type error)
- One function in global scope and use in nested scope (also show type error)
- Variable shadowing in nested scopes (not used in SimpleC) (even if same type, shadowed variable retains value after nested scope exits)
Compile-time vs. run-time behavior
- Notice that these design choices are convenient for checking at compile-time
- Symbol never changes type
- Scope is based on grammatical structure (not control-flow)
- Can we have run-time-style typing in a compiled language?
- No type declarations?
- Change type in different executions of program?
- Access static scope after leaving function?
Not type declarations: type inference
Change type in different program executions: subtyping is one way, e.g., subclasses, generics
Accessing static scopes after control-flow exits scope: closures (functions bundled with static scope's run-time variable bindings)
Choose your own adventure
- Start implementing type-checking for declarations?
- Move on to the typing of expressions?
Typing of expressions
What is a type?
Set of values and operations on those values
Type rules for operations
- Only allow operations on symbols of the same type
- Why do this?
What about operations between types?
- Multiplying float and int?
10 + 3.5
- Why does this work in C, python, etc.?
- Adding int and function?
- Is this every useful?
- What about booleans?
- What are booleans at CPU level?
x86, for instance, has different opcodes for integer and float operations https://www.felixcloutier.com/x86/add
https://www.felixcloutier.com/x86/addps
with C and x86, for instance, boolean operations are actually done with conditional jumps (rather than bitwise arithmetic) https://www.felixcloutier.com/x86/jcc
https://stackoverflow.com/questions/24542/using-bitwise-operators-for-booleans-in-c
Formalizing the type specification
Type checker preparation
- Setting up our visitors
- What are the return types and why?
Our design uses separate statement and expression visitors, reflecting the different types of semantic rules in our language.
Implementing type collection
Symbol table
Scoping
Collecting declarations
Getting types of variable usages
Implementing type checking
Leaf nodes
- Which ones are they?
- What are their types?
Inner nodes
- What are they?
- What are their types?
Putting it all together
Example
f() { return 10; } main() { int x; input x; return x * 2 + f(); }
Type-checking is a special kind of evaluator
- Just like the Calc example
- Tree traversal (with visitor in our case)
- Starts from leaf nodes' values
- Recursively accumulates values
Type-checking is a proof of type correctness
- Leaves are axioms
- Defined ahead of time, not concluded
- Inner nodes (functions or operations) are implications
- Given child nodes truth, can conclude current nodes truth
Functions act like implication
- Example: integer multiplication, (int, int) -> int
- If operands are int, we can conclude the result is an int
- Otherwise, we can't conclude result is int (and there could be a run-time issue)
Does type correctness always imply a correct program?
Does a lack of type correctness always imply an bug or incorrect program?
Compiler project
Implement the type-checker for SimpleC according to the type specification.
You may use the skeleton visitor code as a starting point. Here are some test cases for num and return and a couple for function calls.
To start working on the type-checker, be sure to uncomment the type-checking phase from Compiler.java
// Phase 2: Type checking. TypeChecker typechecker = new TypeChecker(); typechecker.visit(tree);
and from the Makefile
TypeChecker.java \
(the trailing slash denotes a line-continuation that is meant to be there for the variable assignment.)
Be sure your CLASSPATH is configured correctly, i.e., it has both the ANTLR runtime JAR and the directory of your compiler project (using ./
is fine as long as you are running from the repository root, as I will when grading):
CLASSPATH=/usr/share/java/antlr4-runtime.jar:./:${CLASSPATH}
Tips
- Look at the grammar
The grammar is the guide the what symbols are available in each visitor from the right-hand side.
- Be sure to recursively visit child non-terminals
In binary operations, for instance, be sure to visit both the left and right operands at some point, i.e.,
visit(ctx.expr(0))
andvisit(ctx.expr(1))
(in ANTLR, if there are multiple expr non-terminals, pass the 0-based index to get the corresponding one). What is the return value and what does it mean?In statements, be sure to visit any expression used and the statements. For instance, while loops need to visit both the condition expression
visit(ctx.expr())
and the bodyvisit(ctx.stmt())
. The if-then-else production has a conditionvisit(ctx.expr())
and two statements,visit(ctx.stmt(0))
andvisit(ctx.stmt(1))
for the if- and else-branches respectively. - Expression visitors return the type
The type-checker behaves like an interpreter (recall the Calc example). But the "value" produced is a type name, rather than an integer. The visitor is setup to just use a string to hold the type name. SimpleC only has two possible types that expressions can produce, "int" and "bool". Like an interpreter, evaluating expressions means first evaluating any operands in the expression. Statements have no return value in our visitor, since the state is instead held in the symbol tables
functions
andvariables
. Statements only needreturn null
, since the visitor has typeVoid
instead ofvoid
. - Reporting type errors
Use the provided type_error method to report an error. This takes
ctx
and automatically prints the file, line, and column of the error, while also exiting with error code 2 (which will be used to enabled automated grading). While it is not checked or required, you may also report an error message to stderr, e.g.,if (! "int".equals(temp)) { System.err.println("return type should be int"); typeError(ctx); }
- Use
.equals()
not==
for string equality in Java
Checking whether
temp
is the same string as"int"
is done with"int".equals(temp)
. The double-equals==
checks reference equality, which may not be equal for string objects even when their list of characters are the same. - Work bottom-up and write tests as you implement
Instead of trying to implement everything in one go, break the type-checker down into pieces, e.g., one visitor function at-a-time. Working bottom-up is likely easiest, since the inner nodes depend on their child nodes. Add calls to
visit
or remove/comment out other visitors to let the default visitor ensure that the entire tree is traversed (otherwise you may not hit the visitor you are working on). Try this suggested workflow:- Pick a visitor to work with, starting from the bottom of the grammar.
- Look at the type-specification and grammar to start planning the implementation
- Create a test case that should exercise the type specification
- Use print statments to ensure that the visitor is actually reached by the test case
- Collect the required ANTLR calls and symbol table operations needed to implement the specification. What children are needed, e.g.,
ctx.expr()
,ctx.ID().getText()
? What, if any, symbol table operations are needed, i.e., get and put? How will the type comparison be done, e.g., string equality check, visiting a child, looking up in a type, etc.? Have these opeerations hand so you don't have to think about them while implementing the visitor. - Carefully implement the logic of the specification. If you are unsure, use examples to help reason through the behavior. Ask on the class discussion if the specification appears ambiguous. Ensure the implementation compiles.
- Double-check that all possible cases for the visitor are handled by your implementation.
- Write extra test cases the ensure that at a minimum each code path is exercised by your tests.
- Test your implementation, fixing any minor bugs or logic issues.
- Once you have a piece of the implementation working, commit with the test cases and a meaningful commit message.
- Admire your work and take pride in writing solid code!
- Repeat
- An incomplete type-checker implementation may not crash
The lack of a crash is not enough information to say that a program is written correctly. Use print statements (or debugging) to confirm that the piece you have implemented is producing the expected values. Use counter-examples, such as known type-unsafe programs, to make sure the type-checker isn't letting through unsafe programs unexpectedly.
Submission
Push the type checker to the main branch of your github repository. Be sure that it builds with make
from the root directory and can be run with java Compiler program.simplec
.
Grading
Exit code 2 signals a type error. Grading will use several new programs to ensure that the type-checker is catching (and not inadvertantly triggering) type errors on a variety of programs. As given in the template with the type_error()
function, a type-error causes exit code 2 and the file, line, and column numbers to the printed. A correctly typed program should return exit code 0.