UP | HOME

Type checking
Lecture 11

Table of Contents

Informal type specification

Declarations

  • Restrict symbols to a specific type for whole program
    • Static typing
    • No change to type at run-time
    • Can determine at compile-time
f() {
}

main() {
  int y;
  y = 2;  // okay, because y and 2 are both int
  y = f;  // not okay, because y is an int and f is a function
}

What is a type?

How can we remember the types of symbols?

Scoping

  • Symbols valid within a user-defined range
    • Corresponds to functions (or compound statements) in our grammar
    • Called lexical or static scoping
  • No change to scoping at run-time
    • Can determine scopes at compile-time
f() {
  bool y;
  y = true;  // okay, because y is a bool in f's scope
}

main() {
  int y;
  y = true;  // not okay, because this is a different "y"
}

What other scoping choices could we make?

Can we have the same symbol with different types somewhere in the program?

What happens with nested scopes?

How can we remember what symbol belongs to what scope at compile-time?

The symbol table

  • Maintains mapping from symbols to types
  • Maintains scoping information
  • Namespacing
    • Can we have functions, variables, structs, etc. of same name?

Example

  • (In-class diagram)

Look at ASTs for the above examples

Maintain symbol table and scope stack

Cases

  1. One declaration and one usage (also show type error)
  2. One function in global scope and use in nested scope (also show type error)
  3. Variable shadowing in nested scopes (not used in SimpleC) (even if same type, shadowed variable retains value after nested scope exits)

Compile-time vs. run-time behavior

  • Notice that these design choices are convenient for checking at compile-time
    • Symbol never changes type
    • Scope is based on grammatical structure (not control-flow)
  • Can we have run-time-style typing in a compiled language?
    • No type declarations?
    • Change type in different executions of program?
    • Access static scope after leaving function?

Not type declarations: type inference

Change type in different program executions: subtyping is one way, e.g., subclasses, generics

Accessing static scopes after control-flow exits scope: closures (functions bundled with static scope's run-time variable bindings)

Choose your own adventure

Typing of expressions

What is a type?

Set of values and operations on those values

Type rules for operations

  • Only allow operations on symbols of the same type
    • Why do this?

What about operations between types?

  • Multiplying float and int? 10 + 3.5
    • Why does this work in C, python, etc.?
  • Adding int and function?
    • Is this every useful?
  • What about booleans?
    • What are booleans at CPU level?

x86, for instance, has different opcodes for integer and float operations https://www.felixcloutier.com/x86/add

https://www.felixcloutier.com/x86/addps

with C and x86, for instance, boolean operations are actually done with conditional jumps (rather than bitwise arithmetic) https://www.felixcloutier.com/x86/jcc

https://stackoverflow.com/questions/24542/using-bitwise-operators-for-booleans-in-c

Formalizing the type specification

Type checker preparation

  • Setting up our visitors
    • What are the return types and why?

Our design uses separate statement and expression visitors, reflecting the different types of semantic rules in our language.

Implementing type collection

Symbol table

Scoping

Collecting declarations

Getting types of variable usages

Implementing type checking

Leaf nodes

  • Which ones are they?
  • What are their types?

Inner nodes

  • What are they?
  • What are their types?

Putting it all together

Example

f() {
  return 10;
}

main() {
  int x;
  input x;
  return x * 2 + f();
}

Type-checking is a special kind of evaluator

  • Just like the Calc example
    • Tree traversal (with visitor in our case)
    • Starts from leaf nodes' values
    • Recursively accumulates values

Type-checking is a proof of type correctness

  • Leaves are axioms
    • Defined ahead of time, not concluded
  • Inner nodes (functions or operations) are implications
    • Given child nodes truth, can conclude current nodes truth

Functions act like implication

  • Example: integer multiplication, (int, int) -> int
    • If operands are int, we can conclude the result is an int
    • Otherwise, we can't conclude result is int (and there could be a run-time issue)

Does type correctness always imply a correct program?

Does a lack of type correctness always imply an bug or incorrect program?

Compiler project

Implement the type-checker for SimpleC according to the type specification.

You may use the skeleton visitor code as a starting point. Here are some test cases for num and return and a couple for function calls.

To start working on the type-checker, be sure to uncomment the type-checking phase from Compiler.java

// Phase 2: Type checking.
TypeChecker typechecker = new TypeChecker();
typechecker.visit(tree);

and from the Makefile

TypeChecker.java \

(the trailing slash denotes a line-continuation that is meant to be there for the variable assignment.)

Be sure your CLASSPATH is configured correctly, i.e., it has both the ANTLR runtime JAR and the directory of your compiler project (using ./ is fine as long as you are running from the repository root, as I will when grading):

CLASSPATH=/usr/share/java/antlr4-runtime.jar:./:${CLASSPATH}

Tips

  • Look at the grammar

    The grammar is the guide the what symbols are available in each visitor from the right-hand side.

  • Be sure to recursively visit child non-terminals

    In binary operations, for instance, be sure to visit both the left and right operands at some point, i.e., visit(ctx.expr(0)) and visit(ctx.expr(1)) (in ANTLR, if there are multiple expr non-terminals, pass the 0-based index to get the corresponding one). What is the return value and what does it mean?

    In statements, be sure to visit any expression used and the statements. For instance, while loops need to visit both the condition expression visit(ctx.expr()) and the body visit(ctx.stmt()). The if-then-else production has a condition visit(ctx.expr()) and two statements, visit(ctx.stmt(0)) and visit(ctx.stmt(1)) for the if- and else-branches respectively.

  • Expression visitors return the type

    The type-checker behaves like an interpreter (recall the Calc example). But the "value" produced is a type name, rather than an integer. The visitor is setup to just use a string to hold the type name. SimpleC only has two possible types that expressions can produce, "int" and "bool". Like an interpreter, evaluating expressions means first evaluating any operands in the expression. Statements have no return value in our visitor, since the state is instead held in the symbol tables functions and variables. Statements only need return null, since the visitor has type Void instead of void.

  • Reporting type errors

    Use the provided type_error method to report an error. This takes ctx and automatically prints the file, line, and column of the error, while also exiting with error code 2 (which will be used to enabled automated grading). While it is not checked or required, you may also report an error message to stderr, e.g.,

    if (! "int".equals(temp)) {
      System.err.println("return type should be int");
      typeError(ctx);
    }
    
  • Use .equals() not == for string equality in Java

    Checking whether temp is the same string as "int" is done with "int".equals(temp). The double-equals == checks reference equality, which may not be equal for string objects even when their list of characters are the same.

  • Work bottom-up and write tests as you implement

    Instead of trying to implement everything in one go, break the type-checker down into pieces, e.g., one visitor function at-a-time. Working bottom-up is likely easiest, since the inner nodes depend on their child nodes. Add calls to visit or remove/comment out other visitors to let the default visitor ensure that the entire tree is traversed (otherwise you may not hit the visitor you are working on). Try this suggested workflow:

    1. Pick a visitor to work with, starting from the bottom of the grammar.
    2. Look at the type-specification and grammar to start planning the implementation
    3. Create a test case that should exercise the type specification
    4. Use print statments to ensure that the visitor is actually reached by the test case
    5. Collect the required ANTLR calls and symbol table operations needed to implement the specification. What children are needed, e.g., ctx.expr(), ctx.ID().getText()? What, if any, symbol table operations are needed, i.e., get and put? How will the type comparison be done, e.g., string equality check, visiting a child, looking up in a type, etc.? Have these opeerations hand so you don't have to think about them while implementing the visitor.
    6. Carefully implement the logic of the specification. If you are unsure, use examples to help reason through the behavior. Ask on the class discussion if the specification appears ambiguous. Ensure the implementation compiles.
    7. Double-check that all possible cases for the visitor are handled by your implementation.
    8. Write extra test cases the ensure that at a minimum each code path is exercised by your tests.
    9. Test your implementation, fixing any minor bugs or logic issues.
    10. Once you have a piece of the implementation working, commit with the test cases and a meaningful commit message.
    11. Admire your work and take pride in writing solid code!
    12. Repeat
  • An incomplete type-checker implementation may not crash

    The lack of a crash is not enough information to say that a program is written correctly. Use print statements (or debugging) to confirm that the piece you have implemented is producing the expected values. Use counter-examples, such as known type-unsafe programs, to make sure the type-checker isn't letting through unsafe programs unexpectedly.

Submission

Push the type checker to the main branch of your github repository. Be sure that it builds with make from the root directory and can be run with java Compiler program.simplec.

Grading

Exit code 2 signals a type error. Grading will use several new programs to ensure that the type-checker is catching (and not inadvertantly triggering) type errors on a variety of programs. As given in the template with the type_error() function, a type-error causes exit code 2 and the file, line, and column numbers to the printed. A correctly typed program should return exit code 0.

Author: Paul Gazzillo

Created: 2022-03-14 Mon 14:28