I. Intraprocedural Analysis (2.1) A. definitions and notation 1. initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? How would we generalize the formalism to handle such statements? ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? 2. blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ How would you define the set of elementary blocks in a statement? What would the set of blocks in an assert statement be? What would be set of blocks in a nondeterministic choice statement be? How would you define the set of labels in a statement? 3. flows and reverse flows ------------------------------------------ FLOWS flow: Stmt -> Powerset(Lab x Lab) flow([x := a]^l) = {} flow([skip]^l) = {} flow(S1; S2) = flow(if [b]^l then S1 else S2) = flow(while [b]^l do S) = ------------------------------------------ How would we use these functions to represent the nodes and edges of a dataflow graph? What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? How you formulate a set of reverse flows? What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? What are the initial nodes of flow^R? 4. program of interest ------------------------------------------ PROGRAM OF INTEREST S* = the top level statement Lab* = labels(S*) Var* = FV(S*) Blocks* = blocks(S*) AExp* = nontrivial arithmetic expressions in S* ------------------------------------------ What's a trivial expression? ------------------------------------------ SUBEXPRESSIONS AExp(a) = non-trivial arithmetic subexpressions of a AExp(b) = non-trivial arithmetic subexpressions of b ------------------------------------------ ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ What kind of programs would not have isolated entries? What kind of programs would not have isolated exits? ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S have the same label. ------------------------------------------ How would you formalize that? Is there any reason not to have label consistent programs? B. available expressions analysis (2.1.1) 1. idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must have already been computed, and not later modified, on all paths to that program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) ------------------------------------------ What non-trivial expressions are available at entry to block 2? 2. formalization ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) \ killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = {a' \in Aexp(a) | x \not\in FV(a')} genAE([skip]^l) = {} genAE([b]^l) = AExp(b) ------------------------------------------ What's the role of the dataflow graph here? What does the kill function mean? What does the gen function mean? Why don't we have to define the analysis for while loops and if statements? What are we assuming with this formalism? How would we adjust this if we didn't have isolated entries? 3. observations 4. example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) What's the data flow graph? ------------------------------------------ ------------------------------------------ KILL AND GEN What are killAE and genAE for this? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ ------------------------------------------ EXAMPLE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ So what sets do we start with to find a solution? So what would the solution be? C. Reaching Definitions Analysis (2.1.2) Is this a forward or backward analysis? What makes the analysis imprecise? So what solution do we want? Note the use of the word "may" in the analysis statement, what impact does that have on the analysis? D. Very Busy Expressions Analysis (2.1.3) 1. idea and goals ------------------------------------------ VERY BUSY EXPRESSIONS def: A non-trivial expression e is *very busy* at exit from block l if, e must always be used before some x \in FV(e) is assigned. At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 641]^14 ------------------------------------------ ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must be very busy at the exit from the point." ------------------------------------------ What could we use this for? 2. formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else \bigcap { VBentry(l') | (l', l) \in flow^R(S*) } VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = AExp(b) ------------------------------------------ Is this a forward or backward analysis? Does this analysis need isolated exits? Why is there an intersection for VBexit? Do we want the largest or the smallest solution? 3. example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4; l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ E. Live Variables Analysis (2.1.4) 1. idea and goals ------------------------------------------ LIVE VARIABLES def: A variable x is *live* at exit from label l if there is a path from l to a use of x that does not redefine x. Which variables are live at exit from 1? [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 [x := 3]^1; [y := x+2]^2; [y := y+1]^3 [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ ------------------------------------------ LIVE VARIABLES ANALYSIS "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ What can we use this for? 2. definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) = (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = {x} killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = FV(a) genLV([skip]^l) = {} genLV([b]^l) = FV(b) ------------------------------------------ Is this a forward or backward analysis? Does this analysis need isolated exits? Why is there a union for LVexit? Do we want the largest or the smallest solution? ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ F. Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var to its last assignment Definition-use (du) chain: links last assignment of var to its use ------------------------------------------ What might this be useful for? 1. formal definitions ------------------------------------------ DEFINITIONS AND USES definition clear path for x clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & ln = l' & n > 0 & (\forall i : 1 <= i < n : (li, li+1) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ Why are the def and use functions correct? How do you interpret the notion of a clear path? Does clear(y, 3, 7) tell you anything about the use of y? ------------------------------------------ UD AND DU ud: Var* x Lab* -> Powerset(Lab*) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab* -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ Do these require isolated entries? Are these must or may analyses? Can we define du in terms of ud? 2. example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ===================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ 3. computation How could we use RD and LV to compute ud chains? II. Theoretical Properties (2.2) Why are formal correctness proofs useful for program analysis? A. kinds of semantics ------------------------------------------ KINDS OF SEMANTICS Axiomatic Semantics as declarative specifications predicates + predicate transformers Denotational Semantics as functional programming domains + meaning functions Operational Semantics as logic programming configurations + rewrite rules ------------------------------------------ 1. denotational example ------------------------------------------ DENOTATIONAL SEMANTICS OF EXPRESSIONS FOR WHILE LANGUAGE Syntax (1.2): x,y \in Var n \in NumericLiteral a \in AExp b \in BExp op_a \in Op_a op_b \in Op_b op_r \in Op_r a ::= x | n | a1 op_a a2 b ::= true | false | not b | b1 opb b2 | a1 op_r a2 Domains: Z = { ..., -1, 0, 1, ... } T = { true, false } s \in State = Var -> Z Meaning Functions: A : Aexp -> State -> Z A[[x]]s = s(x) A[[n]]s = N(n) A[[a1 op_a a2]]s = (O_a[[op_a]]) (A[[a1]]s) (A[[a2]]s) where N : NumericLiteral -> Z N[[n]] = the value of literal n O_a : Op_a -> (Z x Z -> Z) O_a[[+]] = \n1 n2 . n1 + n2 ... ------------------------------------------ Why is the state a parameter to the meaning of arithmetic expressions? How would you define B : Bexp -> (State -> T)? B. structural operational semantics (2.2.1) 1. varieties ------------------------------------------ VARIATIONS ON STRUCTURAL OPERATIONAL SEMANTICS Big step = map from initial to final state Small step = map from a state to next ------------------------------------------ 2. terminal transition system, little step, computation semantics a. idea ------------------------------------------ COMPUTATION (LITTLE STEP) SEMANTICS Meaning Programs <-------> Answers | ^ input | | output | | v reducesto* | Gamma <-----------> T def: a in Meaning[[P]] iff ------------------------------------------ b. terminal transition systems ------------------------------------------ TERMINAL TRANSITION SYSTEM (TTS) (Gamma, -->, Terminal) Gamma : --> : Terminal : reducesto* reflexive, transitive closure ------------------------------------------ 3. semantics of the while language (2.2.1) ------------------------------------------ CONFIGURATIONS AND TRANSITIONS Configurations (Gamma): Gamma = (Stmt x State) + State s in State = Var -> Z Terminal Configurations: Terminal = State ------------------------------------------ What does a configuration of the form (S, s) mean? What does a configuration of the form s mean? Must every execution reach a final state? What part of the state does an expression depend on? ------------------------------------------ TRANSITIONS (Table 2.6) [ass] ([x := a]^l, s) --> s[x |-> A[[a]]s] [skip] ([skip]^l, s) --> s (S1, s) --> (S1', s') [seq1] -------------------------- (S1;S2, s) --> (S1';S2, s') (S1, s) --> s' [seq2] -------------------------- (S1;S2, s) --> (S2, s') [if1] (if [b]^l then S1 else S2, s) --> (S1, s) if B[[b]]s = true [if2] (if [b]^l then S1 else S2, s) --> (S2, s) if B[[b]]s = false [wh1] (while [b]^l do S, s) --> (S; while [b]^l do S, s) if B[[b]]s = true [wh2] (while [b]^l do S, s) --> s if B[[b]]s = false ------------------------------------------ What do the sequence rules do? Do the rules allow evaluation of the true or false part of an if-statement before evaluating the condition? ------------------------------------------ EXAMPLE Program [q := 0]^1; [r := x]^2; while [r >= y]^3 do ([r := r-y]^4; [q := q+1]^5) Name each statement S_i if it has label i Let S_123 be the whole program Let S_3 be the while loop Let S_23 be the sequential composition of S_2 and S_3 Let S_45 be the body of the while loop Let sqrxy be the state in which q has value q, ..., y has value y E.g, s7062(q) = 7, s0062(x) = 6. --> {by [seq2]} * ([q := 0]^1, s7062> --> {by [ass]} s0062 . --> {by [seq2]} * <[r := x]^2, s0062> --> {by [ass]} s0662 . --> {by [wh1], since A[[r >= y]](s0662) = true} --> {by [seq1]} * --> {by [seq2]} * <[r := r-y]^4, s0662> --> {by [ass]} s0462 . . --> ------------------------------------------ How are the labels used in the semantics? What would be the rule(s) for a one-armed if-statement? How would you add rule(s) for a for loop? What would be the rule(s) for a exitblock statement, with block syntax begin S end? 4. properties of the semantics As the configurations evolve, does the flow graph for the statement in the configuration change? If so, how? ------------------------------------------ PROPERTIES OF THE SEMANTICS How does the flow graph change as the configurations change? Case 1: --> Compare vs. final(S) final(S') flow(S) flow(S') blocks(S) blocks(S') Case 2: --> s' what can we say about the graph of S? ------------------------------------------ C. correctness of the live variables analysis (2.2.2) 1. failed attempt ------------------------------------------ EQUATION SYSTEM LV^=(S*) defined by: LVexit(l) = if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) = (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) Functional form, for a given S*: F^S*_LV: ({entry,exit} -> Lab* -> P(Var*)) -> ({entry,exit} -> Lab* -> P(Var*)) F^S*_LV(F)(exit)(l) = if l \in final(S*) then {} else \bigcup { F(entry)(l') | (l', l) \in flow^R(S*) } F^S*_LV(F)(entry)(l) = (F(exit)(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) Solutions: live: {entry,exit} -> Lab* -> P(Var*) def: live solves LV^=(S*), written live |= LV^=(S*), iff live is a fixpoint of F^S*_LV. ------------------------------------------ Why use functions instead of tuples for solutions? ------------------------------------------ EXAMPLE Q: What is F^S*_LV for: [x := 3]^1; [y := x+2]^2; [y := y+1]^3 ------------------------------------------ What do we want to prove for correctness? ------------------------------------------ DEFINING LIVE VARIABLES SEMANTICALLY What should a solution to LV mean? At a given point, only the live variables matter. def: States s1 and s2 are *similar with respect to a set of variables V*, written s1 ~_V s2, iff (\forall x \in V :: s1(x) = s2(x)). Conjecture: Let S be label consistent. Suppose live |= LV^=(S) and s1 ~_{live(entry)(init(S))} s2. Then: (1) if (S, s1) --> (S', s1'), then there is some s2' such that (S, s2) --> (S', s2') and s1' ~_{live(entry)(init(S'))} s2'. (2) if (S, s1) --> s1' then there is some s2' such that (S, s2) --> s2' and s1' ~_{live(exit)(init(S))} s2'. ------------------------------------------ What happens to ~_V as V shrinks? How would we prove the theorem? 2. fix using constraint system ------------------------------------------ CONSTRAINT SYSTEM LV^{\subseteq}(S*) defined by: LVexit(l) \supseteq if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) \supseteq (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) Functional form, for a given S*, as above F^S*_LV: ({entry,exit} -> Lab* -> P(Var*)) -> ({entry,exit} -> Lab* -> P(Var*)) F^S*_LV(F)(exit)(l) = if l \in final(S*) then {} else \bigcup { F(entry)(l') | (l', l) \in flow^R(S*) } F^S*_LV(F)(entry)(l) = (F(exit)(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) Solutions: live: {entry,exit} -> Lab* -> P(Var*) def: live solves LV^{\subseteq}(S*), written live |= LV^{\subseteq}(S*), iff live \sqsupseteq F^S*_LV(live). def: lv1 \sqsupseteq lv2 iff (\forall e \in {entry,exit} :: (\forall l in Lab* :: lv1(e)(l) \supseteq lv2(e)(l))) Lemma 2.15: If S* is label consistent, then live |= LV^=(S*) implies live |= LV^{\subseteq}(S*). Furthermore, the least solutions coincide. ------------------------------------------ What lemma would give us the property where we got stuck? ------------------------------------------ SOLUTIONS WORK FOR SUBSTATEMENTS Lemma 2.16: Suppose S1 is label consistent, and live |= LV^{\subseteq}(S1). If flow(S1) \supseteq flow(S2), blocks(S1) \supseteq blocks(S2), then S2 is label consistent and live |= LV^{\subseteq}(S2). ------------------------------------------ Why is this true? ------------------------------------------ SOLUTIONS PRESERVED Corollary 2.17: Suppose S is label consistent, and live |= LV^{\subseteq}(S). If (S,s) --> (S',s'), then live |= LV^{\subseteq}(S'). ------------------------------------------ Why does that follow? ------------------------------------------ SOLUTION CAN ONLY SHRINK FORWARD Lemma 2.18: Suppose S is label consistent, and live |= LV^{\subseteq}(S). Then for all (l,l') \in flow(S), live(exit)(l) \supseteq live(entry)(l'). Proof: construction of LV^{\subseteq}(S). ------------------------------------------ ------------------------------------------ GETTING CORRECTNESS RIGHT Theorem 2.21: Let S be label consistent. Suppose live |= LV^{\subseteq}(S) and s1 ~_{live(entry)(init(S))} s2. Then: (1) if (S, s1) --> (S', s1'), then there is some s2' such that (S, s2) --> (S', s2') and s1' ~_{live(entry)(init(S'))} s2'. (2) if (S, s1) --> s1' then there is some s2' such that (S, s2) --> s2' and s1' ~_{live(exit)(init(S))} s2'. ------------------------------------------ What does this theorem tell us about execution sequences? III. Monotone Frameworks (2.3) Can we identify some commonalities between different analyses? Would doing that help implement them? A. general pattern ------------------------------------------ GENERAL PATTERN A_o(l) = if l \in E then i else \bigsqcup {A_.(l') | (l',l) \in F} A_.(l) = f(l)(A_o(l)) where \bigsqcup is either \bigcup or \bigcap F is either flow(S*) or flow^R(S*) E is {init(S*)} or final(S*) i is initial/final information f(l) is the transfer function for blocks B^l \in blocks(S*) For a forward analysis: F is flow(S*) A_o gives the entry conditions A_. gives the exit conditions For a backward analysis: F is flow^R(S*) A_o gives the exit conditions A_. gives the entry conditions ------------------------------------------ B. basic definitions (2.3.1) 1. property space ------------------------------------------ PROPERTY SPACES def: a *property space*, L = (L, \bigsqcup), is a set with \bigsqcup: Powerset(L) -> L a join operation that makes it a complete lattice. Thus: l1 \sqcup l2 = \bigsqcup { l1, l2 } \bot = \bigsqcup {} l1 \sqsubseteq l2 = (l1 \sqcup l2 = l2) Examples: For reaching definitions: L = Powerset(Var* x Lab^?_*) \sqcup = \cup \sqsubseteq = \subseteq For available expressions: L = Powerset(AExp*) \sqcup = \cap \sqsubseteq = \supseteq ------------------------------------------ 2. transfer functions ------------------------------------------ TRANSFER FUNCTION SPACE def: Let L be a partially-ordered set. Then Funs is a *transfer function space for L* iff f \in Funs ==> f : L -> L and f is monotone, f,g \in Funs ==> f o g \in Funs, and id_L \in Funs. ------------------------------------------ 3. monotone framework ------------------------------------------ MONOTONE FRAMEWORK def: (L, Funs) is a monotone framework iff L is a property space and Funs is a transfer function space for L. def: (L, Funs, F, E, i, f_.) is an *instance of a monotone framework* if and only if: - (L, Funs) is a monotone framework, - F is a finite set of pairs (of flows), - E is a finite set of extremal labels, - i \in L is an extremal value, - f : (dom(F) \cup E) -> (L -> L) s.t. for l in (dom(F) \cup E) f_l \in Funs ------------------------------------------ C. examples (2.3.2) D. predicate abstraction (new topic) ------------------------------------------ PREDICATE ABSTRACTION Goal: verify program properties Idea: Use property space of the form L = Powerset(Preds) Preds = {P1, ..., Pn} where each Pi is a nullary predicate Interpretation: {P3,P5} means P3 and P5 may/must be true (depending on kind of analysis \sqcup is \cup Funs = monotonic (in \subseteq) functions on L ------------------------------------------ What's the bottom element? The top? How can you represent states? ------------------------------------------ PREDICATE ABSTRACTION EXAMPLE IsZero Analysis: At a given program point, which variables may be 0. "Abstract States" s \in L = Powerset(Preds) where Preds = {IsZero_x | x in Vars*} IsZero_y means y may be 0 F is flow(S*) E is {init(S*)} i is Preds fIZ(l) : L -> L, for l in Lab* fIZ(l)(s) = (s \ kill_IZ(B^l)(s)) \cup gen_IZ(B^l)(s) where B^l in blocks(S*) kill_IZ([x := a]^l)(s) = {IsZero_x} kill_IZ([skip]^l)(s)(Pi) = {} kill_IZ([b]^l)(s)(Pi) = {} gen_IZ([x := a]^l)(s) = {IsZero_x | (\exists cs \in \gamma(s) :: A[[a]]cs == 0)} gen_IZ([skip]^l)(s) = {} gen_IZ([b]^l)(s) = {} \gamma: L -> Store \gamma(s) = {cs | cs: Var* -> Int, IsZero_x \in s ==> cs(x) == 0} ------------------------------------------ What kind of analysis is this? Why this initial value i? What do the gen and kill functions do? What are the equations for IZ_entry(l) and IZ_exit(l)? ------------------------------------------ EXAMPLE [y := 3]^1; while [y>0]^2 do ([q := y-2]^3; [y := y-1]^4); [q := q div y]^5 Var* = {y, q} Preds = {IsZero_y, IsZero_q} IZ_entry(1) = IZ_exit(1) = IZ_entry(2) = IZ_exit(2) = IZ_entry(3) = IZ_exit(3) = IZ_entry(4) = IZ_exit(4) = IZ_entry(5) = IZ_exit(5) = ------------------------------------------ E. equation solving (2.4) 1. MFP (Maximal Fixed Point) solution (2.4.1) 2. MOP solution (2.4.2) (skip) IV. Interprocedural Analysis (2.5) A. syntax ------------------------------------------ SYNTAX Procedures with 1 value parameter, and 1 result parameter. P \in Program D \in Declaration P ::= begin D S end D ::= proc p(val x, res y) is^ln S end^lx | D D S ::= ... | [call p(a,z)]^lc_lr Example: begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end ------------------------------------------ What else do we need to do analysis and correctness proofs? B. operational semantics (2.5.1) How is a procedure different than a macro? ------------------------------------------ OPERATIONAL SEMANTICS xi in Loc locations rho in Env = Var* -> Loc environments s in Store = Loc ->_fin Z stores Assume s o r is total: ran(rho) \subseteq dom(s) ------------------------------------------ How do these states relate to the states we had previously? How should we deal with global variables? ------------------------------------------ OPERATIONAL SEMANTICS [skip] rho |-* ([skip]^l, s) --> s [ass] rho |-* ([x:=a]^l,s) --> s[rho(x) |-> A[[A]](s o rho)] if s o rho is total ------------------------------------------ What would the sequence rules look like? while? if? What would the calls look like in the operational semantics? ------------------------------------------ CALL AND BIND RULES [call] rho |-* ([call p(a,z)]^lc_lr, s) --> (bind rho*[x |-> xi1, y |-> xi2] in S then z := y, s[xi1 |-> A[[a]](s o r), xi2 |-> v]) if xi1, xi2 not in dom(s), v in Z, proc p(val x, res y) is^ln S end^lx is in D* rho' |-* (S, s) --> (S', s') [bind1]________________________________ rho |-* (bind rho' in S then z := y, s) --> (bind rho' in S' then z := y, s') rho' |-* (S, s) --> s' [bind2]________________________________ rho |-* (bind rho' in S then z := y, s) --> s'[rho(z) |-> s'(rho'(y))] ------------------------------------------ How do you parse the first rule? Where is bind-in-then in the surface syntax? What is [bind2] doing? Do we have to deal with bind-in-then for proofs? ------------------------------------------ EXAMPLE Let P be begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end Name each statement S_i if it has label i (for calls, use the top label). Let S_2 be if [n == 0]^2 then ... Let S_79 be [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 Let rho* = {v |-> 0, w |-> 1} s00 = {0 |-> 0, 1 |-> 0} (and for later use) Let rho1 = rho*[n |-> 2, v |-> 3] = {v |-> 3, w |-> 1, v |-> 2} s0039 = {0 |-> 0, 1 |-> 0, 2 |-> 3, 3 |-> 9} Calcuate in the context of rho*: rho* |-* (S_79, s00) --> * ([call fact(3,v)]^7_8, s00) --> (bind rho*[n |-> 2, v |-> 3] in S_2 then v := v, s00[2 |-> 3, 3 |-> 9]) = (bind rho1 in S_2 then v := v, s0039) . (bind rho1 in S_2 then v := v; S_9, s0039) --> * (bind rho1 in (if [n == 0]^2 then ...) then v := v, s0039) --> rho1 |- (if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11), s0039) --> ([call fact(n-1, v)]^4_5; [v:=v*n]^11, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v; S_9, s0039) ------------------------------------------ C. flow graphs How should we make flow graphs for calls? ------------------------------------------ FLOW GRAPHS FOR CALLS init([call p(a, z)]^lc_lr) = final([call p(a, z)]^lc_lr) = blocks([call p(a, z)]^lc_lr) = labels([call p(a, z)]^lc_lr) = flow([call p(a, z)]^lc_lr) = {(lc;ln), (lx;lr)} if proc p(val x, res y) is^ln S end lx is in D* ------------------------------------------ What should these be? Why use semicolons for the flows? What would happen if p was a program variable? ------------------------------------------ FLOW GRAPHS FOR PROCEDURES For each procedure declaration proc p(val x, res y) is^ln S end lx init(p) = final(p) = blocks(p) = {is^ln, end^lx} \cup blocks(S) labels(p) = flow(p) = ------------------------------------------ What should these be? ------------------------------------------ FLOW GRAPHS FOR PROGRAMS For program P* = begin D* S* end init* = init(S*) final* = final(S*) blocks* = blocks(S*) \cup \bigcup{blocks(p) | proc p... in D*} labels* = labels(S*) \cup \bigcup{labels(p) | proc p... in D*} flow* = labels(S*) \cup \bigcup{labels(p) | proc p... in D*} ------------------------------------------ What is Lab* for such a program? ------------------------------------------ INTERPROCEDURAL FLOWS inter-flow* = { (lc, ln, lx, lr) | P* contains [call p(a,z)]^lc_lr and proc p(val x, res y) is^ln S end^lx } Notation: IF is an abstraction of inter-flow* for forward analysis: IF = inter-flow* for backward analysis: IF = inter-flow^R* ------------------------------------------ How could we use inter-flow*? ------------------------------------------ EXAMPLE begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end What is flow*? What is inter-flow*? ------------------------------------------ What else do we need to do analysis? D. problems with adapting earlier techniques (2.5.2) If we don't use inter-flow* in an analysis, what happens? ------------------------------------------ PROBLEMS WITH NAIVE APPROACH RDentry(l) = if l = init(S*) then {(x,?)|x in Var*} else \bigcup {RDexit(l') | (l',l) \in flow* or (l';l) \in flow* } RDexit(l) = f_l(RDentry(l)) where f_l(RD) = if ([x:=a]^l in blocks*) then (RD \ {(x,l')|l' in Lab^?_*}) \cup {(x,l)} else if ([call p(a,z)]^l_lr in blocks* and proc p(val x, res y) is^ln S end^lx in D*) then (RD \ {(x,l')|l' in Lab^?_*}) \cup {(x,l)} else if ([call p(a,z)]^lc_l in blocks* and proc p(val x, res y) is^ln S end^lx in D*) then (RD \ {(z,l')|l' in Lab^?_*}) \cup {(z,l)} else RD ------------------------------------------ What are the transfer functions for ln and lx for a procedure definition of the form proc p (val x, res y) is^ln S end^lx ? So, what flows are permitted for our example? What's wrong with that? E. making context explicit (2.5.3) 1. embellished monotone framework ------------------------------------------ MAKING CONTEXT EXPLICIT context information: \delta \in \Delta def: Suppose (L, Funs, F, E, i, f_.) is an instance of a monotone framework. Then (\hat{L},\hat{Funs},F,E,\hat{i},\hat{f_.}) is an *embellished monotone framework* iff \hat{L} = \Delta -> L each \hat{g} in \hat{Funs} is monotone and \hat{g} : (\Delta -> L) -> (\Delta -> L) forall all Y in \hat{L}, \hat{f_l}(Y)(\delta) = f_l(Y (\delta)) ------------------------------------------ 2. example How does the analysis use the context, delta ? How is the context manipulated by these transfer functions? How do these work with the formal parameters? Does this work if a formal has the same name as an actual parameter? What happens if the context is a constant in the program? 3. general form of transfer functions ------------------------------------------ GENERAL FORM OF DATAFLOW EQUATIONS FOR EMBELLISHED MONOTONE FRAMEWORKS A_o(l)(delta) = if l \in E then \hat{i} else \bigsqcup {A_.(l')(delta) | (l',l) \in F or (l';l) \in F } A_.(l)(delta) = if call(l) then \hat{f^1_l}(A_o(l))(pop(delta)) if (\exists lc :: return(lc,l)) then \hat{f^2_{lc,l}}(A_o(lc), A_o(l))(delta) else // not call or return then \hat{f_l}(A_o(l)) ------------------------------------------ What kinds of things can \hat{f^2_l) do? 4. using the context F. modeling context (2.5.4) 1. call strings of unbounded length ------------------------------------------ MODELING THE CONTEXT def: a *call string*, in Lab^*, is a sequence of labels [l_1, ... , l_n], n >= 0, such that for each 0 <= i <= n, call(l_i) holds. Example: begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end has call strings: [], [7], [7, 4], [7, 4, 4], ... [9], [9, 4], [9, 4, 4], ... ------------------------------------------ 2. call strings of bounded length ------------------------------------------ CALL STRINGS OF BOUNDED LENGTH Only record the last k calls \Delta = Lab^{<= k} Truncation from left truncate([l_1, ..., l_n], k) = if n <= k then [l_1, ..., l_n] else [l_{n-k+1}, ..., l_n] e.g, truncate([1, 2, 3, 4], 3) = [2, 3, 4] ------------------------------------------ What is the extremal value now? What happens if k = 0? How would the transfer functions look? How would this look in the reaching definitions analysis? 3. assumption sets (2.5.5) (briefly, if short on time) a. large assumption sets ------------------------------------------ ASSUMPTION SETS Simplifying assumption: for some set D, L = Powerset(D) Idea: let context be analysis information at call site E.g., (like k = 1): \Delta = L = Powerset(D) so \hat{L} is isomorphic to: Powerset(Powerset(D) x D) and the extremal value is \hat{i} = {({i},i)} ------------------------------------------ ------------------------------------------ REACHING DEFINITIONS WITH ASSUMPTION SETS \hat{f^1_lc}(Z) = \bigcup { {delta'} x phi^1_lc(X) | (delta, X) \in Z, delta' = {Y | (delta,Y) \in Z} } where phi^1_lc(X) = (X \ {(x,l') | l' in Lab^?_*, x \in val-formals(lc) or x in res-formals(lc)}) \cup {(x,lc) | x \in val-formals(lc)} \cup {(y,?) | y \in res-formals(lc)} \hat{f^2_{lc,lr}}(Z, Z') = \bigcup { {delta} x phi^2_{lc,lr}(X,X') | (delta, X) \in Z, (delta', X') \in Z', delta' = {Y | (delta,Y) \in Z} } where phi^2_{lc,lr}(X,X') = (({(x,l)| (x, l) \in X, x \in val-formals(lr) or x \in res-formals(lr)} \cup (X' \ {(x,l')| l' in Lab^?_*, x in val-formals(lr) or y in res-formals(lr)})) \ {(z,l') | z in res-actuals(lr)}) \cup {(z,lr) | z in res-actuals(lr)} ------------------------------------------ How are the contexts manipulated? How do these compare with the call strings case? b. smaller assumption sets ------------------------------------------ SMALLER ASSUMPTION SETS like k = 0: \Delta = D so \hat{L} is isomorphic to: Powerset(D x D) and the extremal value is \hat{i} = {(i,i)} Results: - loss of precision + size of data flow properties reduced ------------------------------------------ G. flow insensitive analyses (2.5.6) ------------------------------------------ FLOW INSENSITIVE ANALYSIS (2.5.6) Flow sensitive: order of execution matters A_o(S1;S2) <> A_o(S2;S1) Flow insensitive: order of execution ignored A_o(S1;S2) = A_o(S2;S1) Results: - much less precise analysis + much cheaper ------------------------------------------ Don't we always want precise results? Is type checking flow sensitive? 1. may-loop analysis ------------------------------------------ EXAMPLE: MAY-LOOP ANALYSIS Goal: Determine whether a procedure may loop when called. IML(p) built on ML(S) and CP(S) ML analysis L = Boolean ML([x := a]^l) = false ML([skip]^l) = false ML(while [b]^l do S) = true ML(S1; S2) = ML(S1) or ML(S2) ML(if [b]^l then S1 else S2) = ML(S1) or ML(S2) ML([call p(a,z)]^lc_lr) = false CP analysis L = Powerset(ProcName) CP([x := a]^l) = {} CP([skip]^l) = {} CP(while [b]^l do S) = CP(S) CP(S1; S2) = CP(S1) \cup CP(S2) CP(if [b]^l then S1 else S2) = CP(S1) \cup CP(S2) CP([call p(a,z)]^lc_lr) = {p} IML analysis L = Boolean IML(p) = ------------------------------------------ If this is a may analysis, why aren't we using union? How to write IML(p) with this? How to stop recursive calls? How does this compare to a type system? ------------------------------------------ EXAMPLE begin proc inc(val x, res y) is y := x+1 end proc add5(val z, res w) is w := z while w < z + 5 do call inc(w, w); end; call add5(7, v) end IML(inc) = false or (\exists p' \in CP(inc) :: IML(p') or p' = p or p' appears after p in D*) = false IML(add5) = true or (\exists p' \in CP(add5) :: IML(p') or p' = p or p' appears after p in D*) = true ------------------------------------------ V. Shape Analysis (2.6) A. syntax ------------------------------------------ SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dreference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assume [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Can we test for equality of pointers? ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f = t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ B. structural operational semantics (2.6.1) 1. domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" s \in State = Var* -> Storable Storable = Z + Loc + {<>} h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ How would you explain the model of the heap? 2. denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ Do we have to modify the semantics of arithmetic and Boolean expressions now? ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = (OP_a[[op_a]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) ------------------------------------------ What happens if P[[p]](s, h) is undefined? Are there any other cases where A is undefined? How do we prevent pointer arithmetic? ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = if v = <> then tt else ff ------------------------------------------ What kinds of changes would be needed to OP_r, if any? 3. operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ How does that change of configurations affect the description of the transitions? ------------------------------------------ TRANSITIONS [ass] ([x := a]^l, s, h) --> (s[x |-> A[[a]](s,h)], h) if A[[a]](s,h) is defined [fass] ([x.sel := a]^l, s, h) --> (s, h[(s(x),sel) |-> A[[a]](s,h)]) if s(x) \in Loc and A[[a]](s,h) is defined [mal] ([malloc x]^l, s, h) --> (s[x |-> xi], h) if xi \in Loc does not occur in s or h [fmal] ([malloc x.sel]^l, s, h) --> (s, h[(s(x), sel) |-> xi], h) if xi \in Loc does not occur in s or h and s(x) \in Loc ------------------------------------------ What happens if the side conditions are not met in [ass] or [fass]? Can that happen in [fmal]? In [mal]? What would the skip rule look like? seq1? How does the partiality of the Boolean semantics affect the if rules? ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We write L0 and L1 for locations. Let S_i be the statement with label i Let S_i..k be S_i; ...; S_k Let s01 be the state where x has value L0 and y value L1 Let h0 = {} h1 = {(L0,next) |-> L1} h2 = {(L0,next) |-> L1, (L1,next) |-> L0} (S_1234, s??, h0) --> {by [seq2]} * ([malloc x]^1, s??, h0) --> {by [mal], using L0 as the new location} (s0?, h0) . (S_234, s0?, h0) --> {by [seq2]} * ([malloc x.next]^2, s0?, h0) --> {by [fmal], using L1 as the new location} (s0?, h1) . (S_34, s0?, h1) --> {by [seq2]} * ([y := x.next]^3, s0?, h1) --> {by [ass], as A[[x.next]](s0?,h1) = L1} (s01, h1) . ([y.next := x]^4, s01, h1) --> {by [fass], A[[x]](s01,h1) = L0} (s01, h2) ------------------------------------------ C. a shape analysis 1. property space (2.6.2) ------------------------------------------ SHAPE GRAPHS (2.6.2) Goals: - faithfully model real state and heap - smaller (finite) representation def: (S,H,is) is a *shape graph* for S* iff: S \in AState = Powerset(Var* x ALoc) H \in AHeap = Powerset(ALoc x Sel x ALoc) is \in IsShared = Powerset(ALoc) where ALoc = {n_Z | Z \subseteq Var*}. ------------------------------------------ a. abstract locations ------------------------------------------ ABSTRACT LOCATIONS n_{x,y} represents the location that x and y both point to in a given state (s(x) and s(y)) n_{} represents locations not pointed to by variables ------------------------------------------ ------------------------------------------ EXAMPLE What are the abstract locations that exist at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ b. abstract states ------------------------------------------ ABSTRACT STATES S \in AState = Powerset(Var* x ALoc) E.g., {(x, n_{x,z}), (z, n_{x,z}), (y, n_{y})} represents a state s such that s(x) = xi1 s(z) = xi1 s(y) = xi2 ALoc(S) = {n_X | (x, n_X) \in S} EXAMPLE What is the abstract state at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ What about variables that map to integers? c. abstract heaps ------------------------------------------ ABSTRACT HEAPS H \in AHeap = Powerset(ALoc x Sel x ALoc) E.g., {(n_{x,z}, next, n_{y}), (n_{y}, next, n_{}), (n_{}, next, n_{})} represents a heap h and state s such that: h(s(x), next) = s(y) h(s(z), next) = s(y) h(s(y), next) = xi h(xi, next) = xi or xi2 ALoc(H) = {n_V | (n_V, sel, n_W) \in H} \cup {n_W | (n_V, sel, n_W) \in H} EXAMPLE What is the abstract state and heap at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ What do we know about the abstract summary location, in general? d. sharing information ------------------------------------------ SHARING INFORMATION is \in IsShared = Powerset(ALoc) {n_{x,z}} means that the heap has more than one pointer to s(x) or s(y) EXAMPLE What is the sharing information at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; [z := y.next]^5; [z.next := x]^6; [z.back := y]^7; ------------------------------------------ e. invariants ------------------------------------------ INVARIANTS ON SHAPE GRAPHS def: A shape graph (S,H,is) is *compatible* iff 1. (\forall n_V, n_W \in ALoc(S) \cup ALoc(H) \cup is :: V = W or (V \cap W = {})) 2. (\forall (x, n_X) \in S :: x \in X) 3. (\forall (n_V, sel, n_W) in H :: (\forall (n_V, sel, n_W') in H :: V = {} or W = W')) 4. (\forall n_X \in is :: (\exists sel :: (n_{}, sel, n_X) \in H) or (\exists (n_V, sel1, n_X) \in H :: (\exists (n_W, sel2, n_X) \in H :: sel1 != sel2 or V != W))) 5. (\forall (n_V, sel1, n_X) \in H :: (\forall (n_W, sel2, n_X) \in H :: ((sel1 != sel2 or V != W) and X != {}) ==> n_X \in is)) ------------------------------------------ Why are these sensible? f. actual property space ------------------------------------------ PROPERTY SPACE IS SETS OF COMPATIBLE SHAPE GRAPHS Assume a fixed/given program S*. Compatible shape graphs: SG = {(S,H,is) | (S,H,is) is compatible} Property Space L = Powerset(SG) \bigsqcup = \bigcup So \sqsubseteq = \subseteq ------------------------------------------ Is this property space finite? 2. instance of a monotone framework (2.6.3) ------------------------------------------ MONOTONE FRAMEWORK INSTANCE Assume a fixed/given program S*. Montone framework instance: L = Powerset(SG) Funs = {f : L -> L | f is monotone} F = flow(S*) E = {init(S*)} i = some set {(S_0, H_0, is_0)} f^SA_. described below ------------------------------------------ Is this a forward analysis? Is this a may or a must analysis? a. equations ------------------------------------------ RESULTING EQUATIONS Shape_o(l) = if l = init(S*) then i else \bigcup {Shape_.(l') | (l',l) \in flow(S*)} Shape_.(l) = f^SA_l(Shape_o(l)) ------------------------------------------ Which of these is the entry information? b. transfer functions ------------------------------------------ TRANSFER FUNCTIONS General form: f^SA_l(SG) = \bigcup {phi^SA_l((S,H,is)) | (S,H,is) \in SG} ------------------------------------------ What is this doing? ------------------------------------------ TESTS AND SKIP If l labels [b]^l, then phi^SA_l((S,H,is)) = {(S,H,is)} If l labels [skip]^l, then phi^SA_l((S,H,is)) = {(S,H,is)} ------------------------------------------ ------------------------------------------ MANIPULATIONS USEFUL FOR ASSIGNMENTS Taking a variable x out of an ALoc: k_x(n_Z) = n_Y, where Y = Z\{x} Taking variable x out of a shape graph: kill_x((S,H,is)) = (S',H',is') where S' = {(z, k_x(n_Z)) | (z, n_Z) \in S, z != x} H' = {(k_x(n_V), sel, k_x(n_W)) | (n_V, sel, n_W) \in H} is' = {k_x(n_X) | n_X \in is} Taking a field x.sel out of a shape graph: kill_x.sel((S,H,is)) = (S',H',is') where S' = S H' = {(n_V, sel', n_W)) | (n_V, sel', n_W) \in H, n_X \in ALoc(S), (X != V or sel' != sel)} is' = restore(is, H') restore(is, H') = { n_V | n_V \in is, #into(n_V, H') > 1 or (\exists sel':: (n_{}, sel', n_V) \in H') } #into(n_V, H) = number of pointers to n_V in H Adding a binding from x to y's binding: g^y_x(n_Z) = if y \in Z then n_W where W = Z \cup {x} else n_Z Adding a binding from x to y's binding in a Shape graph: glom^y_x((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x((S,H,is)) S'' = {(z,g^y_x(n_Z)) | (z, n_Z) \in S'} \cup {(x,g^y_x(n_Y)) | (y, n_Y) \in S'} H'' = {(g^y_x(n_V), sel, g^y_x(n_W)) | (n_V, sel, n_W) \in H'} is'' = {g^y_x(n_Z) | n_Z \in is'} Adding a binding from x to n_U: h^U_x(n_Z) = if Z = U then n_Y where Y = U \cup {x} else n_Z Adding a binding from x to n_U in a Shape graph: hbind^U_x((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x((S,H,is)) S'' = {(z,h^U_x(n_Z)) | (z, n_Z) \in S'} \cup {(x,h^U_x(n_U))} H'' = {(h^U_x(n_V), sel, h^U_x(n_W)) | (n_V, sel, n_W) \in H'} is'' = {h^U_x(n_Z) | n_Z \in is'} Adding a binding from x.sel to n_Y in a Shape graph: hsbind(x,n_X,n_Y)((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x.sel((S,H,is)) S'' = S' H'' = H' \cup {(n_X, sel, n_Y)} is'' = is \cup {n_Y | #into(n_Y,H') >= 1} ------------------------------------------ What is the type of each of these? What is kill_y((S,H,is)), where S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})} H = {(n_{y,z}, next, n_{}), (n_{x}, next, n_{y,z}), (n_{}, next, n_{y,z}), (n_{}, next, n_{})} is = {n_{y,z}} ? What is glom^y_x((S,H,is)), where S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})} H = {(n_{y,z}, next, n_{}), (n_{x}, next, n_{y,z}), (n_{}, next, n_{y,z}), (n_{}, next, n_{})} is = {n_{y,z}} ? ------------------------------------------ ASSIGNMENTS If l labels [x := a]^l, then if a is of form n, a_1 op_a a_2, or nil, then phi^SA_l((S,H,is)) = {kill_x((S,H,is))} elseif a is of the form x (i.e., x := x), then phi^SA_l((S,H,is)) = {(S,H,is)} elseif a is of the form y, where y != x, then phi^SA_l((S,H,is)) = glom^y_x((S,H,is)) elseif a is of the form x.sel (i.e., x := x.sel), then phi^SA_l((S,H,is)) = (see text/homework) else (a is of the form y.sel, where y != x) let (S',H',is') = kill_x((S,H,is)) in if there is no n_Y such that (y,n_Y) \in S' then phi^SA_l((S,H,is)) = {kill_x((S,H,is))} elseif there is some n_Y such that (y,n_Y) \in S' and there is a n_U != n_{} such that (n_Y, sel, n_U) \in H' then phi^SA_l((S,H,is)) = hbind^U_x((S,H,is)) else (there is some n_Y such that (y,n_Y) \in S' and (n_Y, sel, n_{}) \in H') phi^SA_l((S,H,is)) = {(S'',H'',is'') | (S'',H'',is'') is compatible, kill_x((S'',H'',is'') = kill_x(S,H,is), (x,n_{x}) \in S'', (n_Y, sel, n_{x}) \in H''} end ------------------------------------------ When can all these cases occur? ------------------------------------------ ASSIGNMENTS TO FIELDS If l labels [x.sel := a]^l, then if a is of form n, a_1 op_a a_2, or nil, then if there is no n_X such that (x,n_X) \in S then phi^SA_l((S,H,is)) = {(S,H,is)} elseif there is some n_X such that (x,n_X) \in S and there is no n_U such that (n_X, sel, n_U) \in H then phi^SA_l((S,H,is)) = {(S,H,is)} else (there is some n_X such that (x,n_X) \in S and there is some n_U such that (n_X, sel, n_U) \in H) phi^SA_l((S,H,is)) = {kill_x.sel(S,H,is)} elseif a is of the form x (i.e., x.sel := x), then phi^SA_l((S,H,is)) =(see text/homework) elseif a is of the form y, where y != x, then if there is no n_X such that (x, n_X) \in S then phi^SA_l((S,H,is)) = {(S,H,is)} elseif there is some n_X such that (x, n_X) \in S and there is no n_Y such that (y, n_Y) \in S then phi^SA_l((S,H,is)) = {kill_x.sel(S,H,is)} else (there is some n_X such that (x, n_X) \in S and there is some n_Y such that (y, n_Y) \in S) phi^SA_l((S,H,is)) = hsbind(x,n_X,n_Y)((S,H,is)) else (a is of the form y.sel) phi^SA_l((S,H,is)) = (see text/homework) ------------------------------------------ When can all these cases occur? ------------------------------------------ MALLOC If l labels [malloc x]^l, then phi^SA_l((S,H,is)) = P(S \cup {(x, n_{x})}, H', is') where (S',H',is') = kill_x(S,H,is) Elseif l labels [malloc x.sel]^l, then phi^SA_l((S,H,is)) = (see text/homework) ------------------------------------------ VI. Shape Analysis (2.6) A. syntax ------------------------------------------ SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dereference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assume [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Can we test for equality of pointers? ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f = t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ B. structural operational semantics (2.6.1) 1. domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" s \in State = Var* -> Storable Storable = Z + Loc + {<>} h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ How would you explain the model of the heap? 2. denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ Do we have to modify the semantics of arithmetic and Boolean expressions now? ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = (OP_a[[op_a]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) ------------------------------------------ What happens if P[[p]](s, h) is undefined? Are there any other cases where A is undefined? How do we prevent pointer arithmetic? ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = if v = <> then tt else ff ------------------------------------------ What kinds of changes would be needed to OP_r, if any? 3. operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ How does that change of configurations affect the description of the transitions? ------------------------------------------ TRANSITIONS [ass] ([x := a]^l, s, h) --> (s[x |-> A[[a]](s,h)], h) if A[[a]](s,h) is defined [fass] ([x.sel := a]^l, s, h) --> (s, h[(s(x),sel) |-> A[[a]](s,h)]) if s(x) \in Loc and A[[a]](s,h) is defined [mal] ([malloc x]^l, s, h) --> (s[x |-> xi], h) if xi \in Loc does not occur in s or h [fmal] ([malloc x.sel]^l, s, h) --> (s, h[(s(x), sel) |-> xi], h) if xi \in Loc does not occur in s or h and s(x) \in Loc ------------------------------------------ What happens if the side conditions are not met in [ass] or [fass]? Can that happen in [fmal]? In [mal]? What would the skip rule look like? seq1? How does the partiality of the Boolean semantics affect the if rules? ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We write L0 and L1 for locations. Let S_i be the statement with label i Let S_i..k be S_i; ...; S_k Let s01 be the state where x has value L0 and y value L1 Let h0 = {} h1 = {(L0,next) |-> L1} h2 = {(L0,next) |-> L1, (L1,next) |-> L0} (S_1234, s??, h0) --> {by [seq2]} * ([malloc x]^1, s??, h0) --> {by [mal], using L0 as the new location} (s0?, h0) . (S_234, s0?, h0) --> {by [seq2]} * ([malloc x.next]^2, s0?, h0) --> {by [fmal], using L1 as the new location} (s0?, h1) . (S_34, s0?, h1) --> {by [seq2]} * ([y := x.next]^3, s0?, h1) --> {by [ass], as A[[x.next]](s0?,h1) = L1} (s01, h1) . ([y.next := x]^4, s01, h1) --> {by [fass], A[[x]](s01,h1) = L0} (s01, h2) ------------------------------------------ C. a shape analysis ------------------------------------------ SHAPE ANAYSIS WITH PREDICATE ABSTRACTION Based on (largely quoted from): Roman Manevich, E. Yahav, G. Ramalingam, and Mooly Sagiv. "Predicate abstraction and canonical abstraction for singly-linked lists." In Verification, Model Checking, and Abstract Interpretation, volume 3385 of Lecture Notes in Computer Science, pages 181--198, Berlin, 2005. Springer-Verlag. ------------------------------------------ 1. representing states with predicates (section 2.1) ------------------------------------------ REPRESENTING PROGRAM STATES AS A FIRST-ORDER LOGICAL STRUCTURE Objects represented by "individuals" State represented by a formula over a fixed set of predicates ------------------------------------------ ------------------------------------------ EXAMPLE FOR SINGLY-LINKED LISTS Objects (v) with a field "n" The set P^C = {eq, x, n} predicates meaning ======================================= eq(v1, v2) v1 equals v2 x(v) var x points to object v, for x \in Var* n(v1, v2) the n field of v1 points to v2 Consider the program: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We could encode the state and heap at the end of this code as: Individuals: U = {L0, L1} eq(L0, L0) = true eq(L1, L1) = true eq(L0, L1) = false eq(L1, L0) = false x(L0) = true y(L1) = true x(L1) = false y(L0) = false n(L0, L1) = true n(L1, L0) = true n(L0, L0) = false n(L1, L1) = false ------------------------------------------ How can we tell if a variable y points to <> or an integer? ------------------------------------------ 2-STRUCTS OVER P def: A *2-valued logical structure* over a set of predicates P, 2-STRUCT_P is a pair S = (U,i) where U is the universe (set of individuals) and i is the interpretation function, such that for all p in P of arity k: i(p): U^k -> {true, false} ------------------------------------------ What's the arity of eq? x? n? ------------------------------------------ ABSTRACTION FROM FORMER CONCRETE STATES toPC(s, h) = (U,i) U = {xi \in Loc | xi \in ran(s) or xi \in ran(h)} i(eq(xi1, xi2)) = (xi1 == xi2) i(x(xi)) = (s(x) == xi) i(n(xi1, xi2)) = (h(xi1,n) == xi2) ------------------------------------------ What would you take as a program's initial 2-STRUCT over P^C? Is i invertible? Could we write the language's semantics using 2-STRUCTS over P^C? ------------------------------------------ TRANSITIONS USING 2-STRUCTS OVER P^C [assnil] ([x := nil]^l, (U, i)) --> (U, i[x(xi) |-> false]) if xi \in U and i(x(xi)) [assv] ([x := y]^l, (U, i)) --> (U, i[x(xi) |-> true]) if xi \in U and i(y(xi)) [assf] ([x := y.n]^l, (U, i)) --> (U, i[x(xi) |-> true]) if xi, xi2 \in U, i(y(xi2)), and i(n(xi2, xi)) [fassnil] ([x.n := nil]^l, (U, i)) --> (U, i[n(xi,xi2) |-> false]) if xi, xi2 \in U, i(x(xi)), and i(n(xi, xi2)) [fassv] ([x.n := y]^l, (U, i)) --> (U, i[n(xi,xi2) |-> true]) if xi, xi2 \in U, i(x(xi)), and i(y(xi2)) and there is no xi3 such that i(n(xi, xi3)) [fassv] ([x.n := y.n]^l, (U, i)) --> (U, i[n(xi,xi4) |-> true]) if xi, xi2 \in U, i(x(xi)), i(y(xi2)), and i(n(xi2, xi4)) and there is no xi3 such that i(n(xi, xi3)) [mal] ([malloc x]^l, (U, i)) --> (U \cup {xi}, i[x(xi)]) if not(xi \in U) [fmal] ([malloc x.n]^l, (U, i)) --> (U \cup {xi}, i[n(xi2, xi)]) if not(xi \in U) and i(x(xi2)) ------------------------------------------ How would you generalize to arbitrary selectors? Is there a limit to the size of the 2-STRUCTS in these concrete rules? So are they suitable for doing an analysis? 2. predicate abstraction (section 2.3) ------------------------------------------ PREDICATE ABSTRACTION Let P^A be a set of nullary predicates of interest. P^A = {P_1, ..., P_m} where each P_j is defined by some phi_j: 2-STRUCT[P^C] -> Boolean An *abstract state* is a truth assignment A: {1, ..., m} -> Boolean An *abstraction mapping* beta: 2-STRUCT[P^C] -> 2-STRUCT[P^A] where beta(U,i) = (A, i') and i'(P_j) = phi_j(U,i) ------------------------------------------ Given a concrete state, can we decide on P_i's value? ------------------------------------------ EXAMPLE Suppose we want to prove that the result of a program is a cyclic singly linked list. Idea: Design predicates useful in stating this property. Example set of nullary predicates: Let K > 0 be a fixed number. Let P^A = {NotNull[x] | x \in Var*} \cup {EqualsNext[k][x,y] | 0 <= k <= K and x, y \in Var*} Defining formulas: NotNull[x] = (exists v :: x(v)) EqualsNext[k][x,y] = (exists v_0, ..., v_k :: x(v_0) and y(v_k) and (forall j : 0 <= j <= k : n(v_j, v_{j+1}))) ------------------------------------------ How do these predicates help describe this property? Are these nullary predicates? Don't they take arguments? So what does EqualsNext[3][q,r] mean? How would you represent the set of predicates? What does it mean if a predicate P_i is true? What does it mean if NotNull[x] is false? What does it mean if EqualsNext[3][q,r] is false? ------------------------------------------ IMPLEMENTING THE ANALYSIS Idea: Use the defining formulas to induce the transfer functions In our example A is fixed (for a program), so we have configurations of the form (S,i') and i'. [assnil] ([x := nil]^l, i') --> i'[NotNull[x] |-> false, (forall k, y :: EqualsNext[k][x,y] |-> false)] [assv] ([x := y]^l, i') --> i'[NotNull[x] |-> b, EqualsNext[0][x,y] |-> true] if b = i'(NotNull[y]) [assf] ([x := y.n]^l, i') --> i'[NotNull[x] |-> (exists k, z :: EqualsNext[k][y,z]), (forall k, z : k < K : EqualsNext[k][x,z] |-> EqualsNext[k+1][y,z])] if i'(NotNull[y]) [fassnil] ([x.n := nil]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> false)] if i'(NotNull[x]) ------------------------------------------ How would you do the rest of these? How do these relate to the transfer functions?