I. The Nature of Program Analysis (1.1)
A. goals
What are the goals of program analysis?
B. ideas
What are the main ideas of program analysis?
What's the difference between a semantics-based approach and
a semantics-directed approach?
C. applications
What other areas, besides optimizing compilers, could the ideas
and goals of program analysis usefully be applied?
II. Setting the Scene (1.2)
A. The While Language
1. abstract syntax
------------------------------------------
ABSTRACT SYNTAX OF THE WHILE LANGUAGE
a \in AExp "arithmetic expressions"
b \in BExp "Boolean expressions"
S \in Stmt "statements"
x,y \in Var "variables"
n \in Num "numeric literals"
l \in Lab "labels"
opa \in Op_a "arithmetic operators"
opb \in Op_b "Boolean operators"
opr \in Op_r "relational operators"
S ::= [x:= a]^l
| [skip]^l
| S1 ; S2
| if [b]^l then S1 else S2
| while [b]^l do S
a ::= x
| n
| a1 opa a2
b ::= true
| false
| not b
| b1 opb b2
| a1 opr a2
------------------------------------------
What's the difference between abstract and concrete syntax?
------------------------------------------
EXAMPLE
[y := x]^1;
[z := 1]^2;
while [y>1]^3
do ([z := z*y]^4;
[y := y-1]^5);
[y := 0]^6
------------------------------------------
Why are the labels only attached to certain places in the
abstract syntax trees?
Which blocks are elementary?
What is used to disambiguate different parse trees?
2. semantics
What does the example above do?
What does skip do?
What kind of numeric literals are in this language?
What would be some reasonable arithmetic operators?
One would be some reasonable relational operators?
What would be some reasonable Boolean operators?
What's the type of the relational operators?
Can the labels be repeated?
What's missing?
3. variations
------------------------------------------
VARIATION 1: CONTROL FEATURES
S ::= ...
| break
| for x in a1 .. a2 do S
| throw
| try S1 catch S2
VARIATION 2: SPECIFICATION FEATURES
S ::= ...
| assert b
| assume b
| choose S1 or S2
VARIATION 3: PARALLELISM
S ::= ...
| S1 `||' S2
VARIATION 4: FUNCTIONS
a ::= ...
| fn x => a
| a1 a2
| let d in a
d ::= x = a
| d1; d2
------------------------------------------
Which of these need labels, and where do the labels go?
B. reaching definitions analysis
1. definition
------------------------------------------
REACHING DEFINITIONS (ASSIGNMENTS)
def: Let P be a program.
An assignment [x := a]^l
*may reach* a program point in P
if in some execution of P,
when execution reaches that point,
the last assignment to x was done at l.
------------------------------------------
What can we tell for certain from this analysis's results?
2. examples
------------------------------------------
EXAMPLE
[y := x]^1;
[z := 1]^2;
while [y>1]^3
do ([z := z*y]^4;
[y := y-1]^5);
[y := 0]^6
(y,1) reaches entry to 2
------------------------------------------
What definitions reach the entry for label 3?
What definitions can't reach the entry for label 3?
How do we talk about uninitialized variables?
In table 1.1, what if we add (y,1) to RDentry(5)?
In table 1.1, what if we remove (y,5) from RDentry(6)?
------------------------------------------
FOR YOU TO DO
Compute a table like table 1.1 for:
[t := x]^1;
[x := y]^2;
[y := t]^3
if [y>x]^4
then [r := y]^5
else [r := x]^6;
assert [r >= x and r >= y]^7
------------------------------------------
3. variations
What would the analysis look like if we only wanted to keep
track of which variables were assigned?
What would the analysis look like if we want to keep track of
what variables could influence the values of other variables?
III. Dataflow Analysis (1.3)
A. goals
B. idea
What's the basic idea?
What's a data flow graph?
How is that used to model the semantics?
C. example
------------------------------------------
EXAMPLE
[q := 0]^1;
[r := x]^2;
while [r >= y]^3
do ([r := r-y]^4;
[q := q+1]^5);
assert [0<=r and r Powerset(Var*)
where Lab* = set of labels in program
Var* = set of variables in prog
block Equation
=======================================
[x:=a]^l AVexit(l) =
[skip]^l AVexit(l) =
[b]^l AVexit(l) =
How are edges connected?
------------------------------------------
Why are these the right equations?
How does this work out for our example?
a. algorithm for solving the equations
What can we do to solve a set of simultaneous equations?
------------------------------------------
LEAST SOLUTION
F: (Powerset(Var*))^12
-> (Powerset(Var*))^12
F is defined by:
F(AV_1, ..., AV_{12})
= (F_1(AV_1, ..., AV_{12}),
F_2(AV_1, ..., AV_{12}),
...,
F_{12}(AV_1, ..., AV_{12}))
where
F_1(AV_1, ..., AV_{12})
= {(q,?),(r,?),(x,?),(y,?)}
F_2(AV_1, ..., AV_{12})
= AV_1 union {x}
Solution
(AVentry(1), AVexit(1), ..., AVexit(6))
is a solution if
(AVentry(1), AVexit(1), ..., AVexit(6))
= F(AVentry(1), AVexit(1), ..., AVexit(6))
------------------------------------------
Would this still be a 12-tuple if there were seven
elementary blocks?
How to compare such tuples?
What does the lattice structure of (Powerset(Var* x Lab*))^12
look like?
How can one find the fixed point using this information?
2. the constraint based approach
What are the constraints?
How do the elementary blocks work?
How do the flows work?
What's the connection to the equations?
IV. constraint based analysis (1.4)
A. goals
What's the difference between a data flow analysis and a control
flow analysis?
Why isn't control flow obvious in all languages?
B. setting
------------------------------------------
SETTING
1. Convert all control structures to
functions.
2. Analysis finds what functions
can be called from each point
------------------------------------------
Why switch to a functional language for this section?
------------------------------------------
CONTINUATION PASSING STYLE
An intermediate language
with one control structure
Idea: every expression takes a
"continuation"
to which it sends its result
Examples:
x < 0
==>
[fn k => [[[%< x] 0] k]]
if [x < 0]
then [y := 22]
else [z := 33]
==>
[fn k =>
[[[%< x] 0]
[%if [[y := 22] k]
[[z := 33] k]]]]
------------------------------------------
------------------------------------------
LANGUAGE (p. 140)
Work in a functional language:
e \in Exp
t \in Term
f,x \in Var
c \in Const
op \in Op
l \in Lab
e ::= t^l
t ::= c
| x
| fn x => e_0 "non-recursive fun"
| fun f x => e_0 "recursive fun def"
| e_1 e_2
| if e_0 then e_1 else e_2
| let x = e_1 in e_2
| e_1 op e_2
------------------------------------------
What are the atomic blocks being labeled here?
C. idea
What are the main ideas in this approach?
------------------------------------------
IDEAS OF CONSTRAINT BASED ANALYSIS
- assume no side effects
==> associate information with labels
- use a pair of functions, (C,p):
C: Lab* -> Powerset(Value)
C(l) contains possible values for
subexpression at label l
p: Var* -> Powerset(Value)
p(x) constains possible values for
variable x
------------------------------------------
What's the alternative to associating information directly
with labels?
How could this information be useful in an object-oriented
program?
How is this different than a type system?
------------------------------------------
APPROACH
- collect constraints
for function abstractions:
e.g., given
[fn x => [x]^1]^2
get
{[fn x => [x]^1]} \subseteq C(2)
------------------------------------------
What's the value of a function definition?
Why do they just use the term instead of a closure?
What's the general pattern here?
------------------------------------------
for variables:
e.g., given
[x]^1
get
p(x) \subseteq C(1)
------------------------------------------
What's the general pattern here?
------------------------------------------
for applications:
e.g., given
[[f]^1 [e]^2]^3
get
{v | g \in C(1), a \in C(2), and
v = (g a)}
\subseteq C(3)
------------------------------------------
What's the general rule?
What happens in [[fn x => [x]^1]^2 [fn y => [y]^3]^4]^5 ?
What would be the constraints for an expression of the form
[[e1]^1 op [e2]^2]^3 ?
What would be the constraints for an if-then-else expression?
What would be the constraints for a let expression,
like let x = e1 in e2 ?
V. Abstract Interpretation (1.5)
A. goals
B. idea
What's the basic idea?
What's a collecting semantics?
How is that used to extract the analysis?
C. example
------------------------------------------
EXAMPLE
[q := 0]^1;
[r := x]^2;
while [r >= y]^3
do ([r := r-y]^4;
[q := q+1]^5);
assert [0<=r and r (Powerset(Trace))^{12}
G is defined by:
G(CS_1, ..., CS_{12})
= (G_1(CS_1, ..., CS_{12}),
G_2(CS_1, ..., CS_{12}),
...,
G_{12}(CS_1, ..., CS_{12}))
where
G_1(CS_1, ..., CS_{12})
= {((q,?),(r,?),(x,?),(y,?))}
G_2(CS_1, ..., CS_{12})
= {tr : (q,1) | tr \in CS_1}
G_3(CS_1, ..., CS_{12})
= CS_2
...
Solution
(CSentry(1), CSexit(1), CSentry(2),
..., CSentry(6), CSexit(6))
is a solution if
G(CSentry(1), CSexit(1), CSentry(2),
..., CSentry(6), CSexit(6))
= (CSentry(1), CSexit(1), CSentry(2),
..., CSentry(6), CSexit(6))
------------------------------------------
What is G_1?
Why does it have 12 parameters?
What is G_2?
Why is G_3 = CS_2 in this example?
What's the ordering on the solution space?
What does it mean for G to be monotone?
2. Galois connections
------------------------------------------
ABSTRACTION AND CONCRETIZATION
abstraction function for AV analysis:
a: Powerset(Trace) -> Powerset(Var)
a(X) = { x | (x,l) \in tr, l != ?,
tr \in X }
concretization function for AV analysis:
g: Powerset(Var) -> Powerset(Trace)
g(Y) = { tr | (\forall x \in Y ::
(\exists l \in Lab* ::
l != ?, (x,l) \in tr))}
Adjunction, or Galois connection:
a(X) \subseteq Y <==> X \subseteq g(Y)
------------------------------------------
Would a and g be different for the RD analysis?
------------------------------------------
set of traces set of vars
|---------------| |---------------|
| | g | |
| g(Y) <---------------- Y |
| U| | | U| |
| X -----------------> a(X) |
| | a | |
|_______________| |_______________|
------------------------------------------
3. calculating the analysis
Why do we care about the abstraction and concretization
functions?
------------------------------------------
CALCULATING THE ANALYSIS
Extend a and g pointwise to tuples:
a(X_1, ..., X_12)
= (a(X_1), ..., a(X_12))
g(Y_1, ..., Y_12)
= (g(Y_1), ..., g(Y_12))
Define the AV analysis by the function
a o G o g: Powerset(Var)^12
-> Powerset(Var)^12
so for each i in {1..12}
(a o G_i o g): Powerset(Var)^12
-> Powerset(Var)
by
a(G_1(g(AV_1, ..., AV_12)))
= a({(q,?),(r,?),(x,?),(y,?)})
= {}
a(G_2(g(AV_1, ..., AV_12)))
= a({tr : (q,1) | tr \in g(AV_1)})
So a solution
(AVentry(1), ..., AVexit(6))
has the property that
(AVentry(1), ..., AVexit(6))
= (a o G o g)(AVentry(1), ..., AVexit(6))
------------------------------------------
What's AV_3?
How does this compare to the analysis we created by hand?
What's the benefit of doing things this way?
VI. Type and Effect Systems (1.6)
A. goals
B. idea
What's the basic idea?
C. annotated type system example
1. annotated base types
------------------------------------------
ANNOTATED TYPE SYSTEM EXAMPLE
[ass] [x := a]^l : AV1 -> AV2
if AV2 = AV1 \cup {x}
[skip] [skip]^l : AV -> AV
S1 : AV1 -> AV2, S2: AV2 -> AV3
[seq] --------------------------------
S1; S2 : AV1 -> AV3
S1 : AV1 -> AV2, S2: AV1 -> AV2
[if] --------------------------------
if [b]^l then S1 else S2 : AV1 -> AV2
S : AV1 -> AV1
[wh] --------------------------------
while [b]^l then S : AV1 -> AV1
S : AV2 -> AV3
[sub] -------------- if AV1 \subseteq AV2,
S : AV1 -> AV4 AV3 \subseteq AV4
------------------------------------------
What's the meaning of the basic types?
What's the meaning of the arrow types, like AV1 -> AV4?
How would you explain the assignment rule?
How would you explain the if rule?
Why don't we have to check the type of the condition in an if?
How would you explain the while rule?
How would you explain the sub rule?
a. type checking
------------------------------------------
EXAMPLE
[q := 0]^1;
[r := x]^2;
while [r >= y]^3
do ([r := r-y]^4;
[q := q+1]^5);
assert [0<=r and r AV2, [ass]
[r := x]^2: AV2 -> AV3 [ass]
___________________________________ [seq]
([q := 0]^1;[r := x]^2)
: AV1 -> AV3
if AV3 = AV2 \cup {r},
AV2 = AV1 \cup {q}
[r := r-y]^4: AV4 -> AV4 \cup {r}, [ass]
[q := q+1]^5: AV5 -> AV5 \cup {q} [ass]
___________________________________ [seq]
([r := r-y]^4;[q := q+1]^5)
: AV4 -> AV6
if AV6 = AV5 \cup {q}
AV5 = AV4 \cup {r}
___________________________________[sub]
([r := r-y]^4;[q := q+1]^5)
: AV6 -> AV6
if AV6 \subseteq AV4
___________________________________[wh]
while [r >= y]^3
do ([r := r-y]^4;[q := q+1]^5)
: AV6 -> AV6
so by the seq rule,
------------------------------------------
So what are all the constraints?
------------------------------------------
CONSTRAINTS
AV2 = AV1 \cup {q}
AV3 = AV2 \cup {r},
AV5 = AV4 \cup {r}
AV6 = AV5 \cup {q}
AV6 \subseteq AV4
AV3 = AV6
AV6 = AV7
------------------------------------------
So, what's a solution?
2. annotated type constructors
------------------------------------------
EXAMPLE
Judgments:
XMust
S : Sigma -------------> Sigma
YMay
Where XMust and YMay are sets of variables.
{x}
[ass] [x := a]^l : Sigma ----> Sigma
{x}
{}
[skip] [skip]^l : Sigma -----> Sigma
{}
X1
S1 : Sigma -----> Sigma,
Y1
X2
S2 : Sigma -----> Sigma
Y2
[seq] --------------------------------
X3
S1; S2 : Sigma ----> Sigma
Y3
if X3 = X1 \cup X2,
Y3 = Y1 \cup Y2
X1
S1 : Sigma -----> Sigma,
Y1
X2
S2 : Sigma -----> Sigma
Y2
[if] --------------------------------
if [b]^l then S1 else S2
X3
: Sigma ----> Sigma
Y3
if X3 = X1 \cap X2,
Y3 = Y1 \cup Y2
X
S : Sigma -----> Sigma
Y
[wh] --------------------------------
while [b]^l then S
{}
: Sigma ----> Sigma
Y
X
S : Sigma ----> Sigma
Y
[sub] ----------------------
X'
S : Sigma -----> Sigma
Y'
if X' \subseteq X,
Y \subseteq Y'
------------------------------------------
Why does the assignment rule make sense?
How do you explain the if rule?
How do you explain the while rule?
How would you deal with assert statements?
How would you deal with a try statement?
------------------------------------------
TYPE CHECKING EXAMPLE
TYPE CHECKING
Idea: accumulate constraints.
{q}
[q := 0]^1: Sigma ---> Sigma , [ass]
{q}
{r}
[r := x]^2: Sigma ---> Sigma [ass]
{r}
_________________________________ [seq]
([q := 0]^1;[r := x]^2)
{q,r}
: Sigma ----> Sigma
{q,r}
{r}
[r := r-y]^4: Sigma ---> Sigma, [ass]
{r}
{q}
[q := q+1]^5: Sigma ---> Sigma [ass]
{q}
_________________________________ [seq]
([r := r-y]^4;[q := q+1]^5)
{q,r}
: Sigma ----> Sigma
{q,r}
___________________________________[wh]
while [r >= y]^3
do ([r := r-y]^4;[q := q+1]^5)
{}
: Sigma ----> Sigma
{q,r}
so by the seq rule,
------------------------------------------
How is this different than with annotated base types?
D. effect systems
1. example: call tracking analysis
------------------------------------------
EXAMPLE
Judgments:
Gamma |- e : t & phi
where Gamma : Var -> Type
e : Expression
t : Type
phi : Effect
phi
Type = int | bool | t1 ---> t2
phi : Powerset(FunName)
[var] Gamma |- x : t & {}, if Gamma(x) = t
Gamma[x |-> tx] |- e : t & phi
[fn] --------------------------------
Gamma |- fn_pi x => e
phi2
: tx ------> t & {}
if phi2 = phi \cup {pi}
phi
Gamma |- e1 : t2 ---> t & phi1,
Gamma |- e2 : t2 & phi2
[app] --------------------------------
Gamma |- e1 e2 : t & phi3
if phi3 = phi1 \cup phi2 \cup phi
------------------------------------------
What do the judgments mean?
What do the arrow types mean?
How would you explain the fn rule?
How would you explain the app rule?