I. Intraprocedural Analysis (2.1) A. definitions and notation 1. initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? How would we generalize the formalism to handle such statements? ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? 2. blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ How would you define the set of elementary blocks in a statement? What would the set of elementary blocks in an assert statement be? What would be set of elementary blocks in a nondeterministic choice statement be? How would you define the set of labels in a statement? 3. flows and reverse flows ------------------------------------------ FLOWS flow: Stmt -> Powerset(Lab x Lab) flow([x := a]^l) = {} flow([skip]^l) = {} flow(S1; S2) = flow(if [b]^l then S1 else S2) = flow(while [b]^l do S) = ------------------------------------------ How would we use these functions to represent the nodes and edges of a control flow graph? What are the nodes of the control flow graph? What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? How you formulate a set of reverse flows? What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? What is flow^R of (while [x <= 4]^1 do [x := x+1]^2); [assert x == 4]^3 ? What are the initial nodes of flow^R? 4. program of interest ------------------------------------------ PROGRAM OF INTEREST S* = the top level statement Lab* = labels(S*) Var* = FV(S*) Blocks* = blocks(S*) ------------------------------------------ ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ What kind of programs would not have isolated entries? What kind of programs would not have isolated exits? Could we convert programs to have both isolated entries and exits? ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S Formally: ------------------------------------------ How would you formalize that? Is there any reason not to have label consistent programs? B. Example analyses 1. Available Expressions (AE) analysis (2.1.1) a. definitions: trivial and non-trivial expressions What's a trivial expression? ------------------------------------------ SUBEXPRESSIONS def: An expression is *trivial* iff Aexp(a) = non-trivial arithmetic subexpressions of expression, a E.g., Aexp(-x + -y) = Aexp* = nontrivial arithmetic expressions in S* ------------------------------------------ b. idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS (AE) ANALYSIS Analysis question (p. 37): "For each program point, which [non-trivial] expressions must have already been computed, and not later modified, on all paths to that program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do { [t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5 } ------------------------------------------ Why would a compiler writer care about non-trivial expressions? Why would trivial expressions not matter? What non-trivial expressions are available at entry to block 2? c. observations What would be the bad outcome, which should be avoided? What makes a "solution" unsafe? What makes it imprecise? This is a forward analysis, Why? Note the use of the word "must", what impact does that have? d. formalization ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) - killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = genAE([skip]^l) = {} genAE([b]^l) = ------------------------------------------ What does the kill function mean? What does the gen function mean? Why don't we have to define the analysis for while loops and if statements? e. example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5) What's the control flow graph? ------------------------------------------ What is Aexp* in this example? What's the role of the control flow graph here? ------------------------------------------ KILL AND GEN FOR THE EXAMPLE What are killAE and genAE in the example? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ Do the kill and gen functions depend on what happens at other nodes? ------------------------------------------ EXAMPLE'S AE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ What are we assuming with this formalism? How would we adjust this if we didn't have isolated entries? What's the bad outcome we want to avoid? What makes the analysis information more imprecise? What makes the analysis unsafe? So what do we want for a solution the largest or smallest? i. Finding a Solution using Chaotic Iteration (1.7) How can we get rid of the recursion in the dataflow equations? What sets do use to start iterating from to find a solution? ------------------------------------------ FORMALIZING THE ANALYSIS AS AN ITERATION Let \vec{AE} be (AE1, AE2, ..., AE10) where: AE1 is the entry info for label 1 from the previous iteration AE2 is the exit info for label 1 from the previous iteration AE3 is entry info. for label 2 from the previous iteration AE4 is exit info for label 2 from the previous iteration AE5 is entry info for label 3 AE6 is exit info for label 3 ... AE10 is exit info for label 5 define F(v) = (F1(v), F2(v), ..., F10(v)) where: F1(AE1, AE2, ..., AE10) // entry(1) = {} F2(AE1, AE2, ..., AE10) // exit(1) = (AE1 - killAE([k:=i*j-1]^1)) \cup genAE([k:=i*j-1]^1) = (AE1 - {a+k}) \cup {i*j,i*j-1} F3(AE1, AE2, ..., AE10) // entry(2) = AE2 \cap AE10 F4(AE1, AE2, ..., AE10) // exit(2) = (AE3 - killAE([i*j-1 < n]^2)) \cup genAE([i*j-1 < n]^2) = (AE3 - {}) \cup {i*j,i*j-1} F5(AE1, AE2, ..., AE10) // entry(3) = AE4 F6(AE1, AE2, ..., AE10) // exit(3) = (AE5 - killAE([t := a+k]^3)) \cup genAE([t := a+k]^3) = (AE5 - {}) \cup {a+k} F7(AE1, AE2, ..., AE10) // entry(4) = AE6 F8(AE1, AE2, ..., AE10) // exit(4) = (AE7 - killAE([j := j+1]^4)) \cup genAE([j := j+1]^4) = (AE7 - {i*j,i*j-1,j+1}) \cup {} F9(AE1, AE2, ..., AE10) // entry(5) = AE8 F10(AE1, AE2, ..., AE10) // exit(5) = (AE9 - killAE([k := i*j-1]^5)) \cup genAE([k := i*j-1]^5) = (AE9 - {a+k}) \cup {i*j, i*j-1} Then a fixed-point of F is a solution. ------------------------------------------ ------------------------------------------ CORRECTNESS OF ITERATION Def 0 (\sqsubseteq): tuples of available expressions are ordered pointwise by superset inclusion: i.e., (v1,...,v10) \sqsubseteq (v1',...,v10') iff v1 \supseteq v1' and ... and v10 \supseteq v10' Exercise 1: F is monotonic (in \sqsubseteq): i.e., if v \sqsubseteq v' then F(v) \sqsubseteq F(v'). Corollary 2: Since (Aexp*,...,Aexp*) \sqsubseteq (Aexp*,...,Aexp*), F(Aexp*,...,Aexp*) \sqsubseteq F(Aexp*,...,Aexp*) Corollary 3: (Aexp*,...,Aexp*) \sqsubseteq F(Aexp*,...,Aexp*) Exercise 4: For all natural numbers n: (Aexp*,...,Aexp*) \sqsubseteq F^n(Aexp*,...,Aexp*) Exercise 5: there is some natural number n such that F^n(Aexp*,...,Aexp*) = F(F^n(Aexp*,...,Aexp*)) (so that F^n(Aexp*,...,Aexp*) is a fixed point of F). ------------------------------------------ Why do these hold? ------------------------------------------ USING THE ITERATION DIRECTLY Iterate F, starting with (Aexp*, Aexp*, ..., Aexp*) at each iteration get a better approximation (smaller set) until reach a fixed point First iteration: F(Aexp*,Aexp*,...,Aexp*) = (F1(Aexp*), F2(Aexp*), ..., F10(Aexp*)) F1(Aexp*,Aexp*,...,Aexp*) = {} F2(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} F3(Aexp*,Aexp*,...,Aexp*) = Aexp* \cap Aexp* = Aexp* F4(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {}) \cup {i*j,i*j-1} = Aexp* F5(Aexp*,Aexp*,...,Aexp*) = Aexp* F6(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {}) \cup {a+k} = Aexp* F7(Aexp*,Aexp*,...,Aexp*) = Aexp* F8(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(Aexp*,Aexp*,...,Aexp*) = Aexp* F10(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} ------------------------------------------ ------------------------------------------ SECOND ITERATION Let \vec{v_1} // output of first iteration = ({}, {i*j, i*j-1, j+1}, Aexp*, Aexp*, Aexp*, Aexp*, Aexp*, {a+k}, Aexp*, {i*j, i*j-1, j+1}) The second iteration computes: F1(\vec{v_1}) = {} F2(\vec{v_1}) = ({} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} F3(\vec{v_1}) = {i*j, i*j-1, j+1} \cap {i*j, i*j-1, j+1} = {i*j, i*j-1, j+1} F4(\vec{v_1}) = (Aexp* - {}) \cup {i*j,i*j-1} = Aexp* F5(\vec{v_1}) = Aexp* F6(\vec{v_1}) = (Aexp* - {}) \cup {a+k} = Aexp* F7(\vec{v_1}) = Aexp* F8(\vec{v_1}) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(\vec{v_1}) = {a+k} F10(\vec{v_1}) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} ------------------------------------------ Compared to the inputs to the second iteration, did any outputs change? ------------------------------------------ THIRD ITERATION Let \vec{v_2} // output of 2nd iteration = ({}, {i*j,i*j-1}, {i*j,i*j-1,j+1}, Aexp*, Aexp*, Aexp*, Aexp*, {a+k}, {a+k}, {i*j, i*j-1, j+1}) The third iteration computes: F1(\vec{v_2}) = {} F2(\vec{v_2}) = ({} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} F3(\vec{v_2}) = {i*j, i*j-1} \cap {i*j, i*j-1, j+1} = {i*j, i*j-1} F4(\vec{v_2}) = ({i*j, i*j-1, j+1} - {}) \cup {i*j,i*j-1} = {i*j, i*j-1, j+1} F5(\vec{v_2}) = Aexp* F6(\vec{v_2}) = (Aexp* - {}) \cup {a+k} = Aexp* F7(\vec{v_2}) = Aexp* F8(\vec{v_2}) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(\vec{v_2}) = {a+k} F10(\vec{v_2}) = ({a+k} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} ------------------------------------------ What changed in the third iteration? Is this a lot of bookkeeping? Is there a more efficient way to do this? ------------------------------------------ CHAOTIC ITERATION ALGORITHM (1.7) Goal: compute the desired fixed point of dataflow equations (quickly) Idea: - use a functional (F =(Fi), as before) to make the current approximation more and more precise - initialize the argument vector to be either: - a vector of empty sets, or - a vector of complete sets depending on if want to find the - least fixed point, or - greatest fixed point For our example in the AE analysis, we want greatest fixed point, so: Algorithm (Table 1.5): Step 1: initialize AEi := Aexp* for i in {1,...,10} Step 2: while \exists j such that AEj \neq Fj(AE1, ..., AE10) do let j be such that AEj \neq Fj(AE1, ..., AE10) in AEj := Fj(AE1, ..., AE10) Step 3: return (AE1, ..., AE10) ------------------------------------------ Why is this algorithm correct? ------------------------------------------ RECALL THE FUNCTIONAL FOR THE EXAMPLE Recall that F(v) = (F1(v),...,F10(v)) where F1(AE1,AE2,...,AE10) = {} F2(AE1,AE2,...,AE10) = (AE1 - killAE([k:= i*j-1]^1)) \cup genAE([k:= i*j-1]^1) = (AE1 - {a+k}) \cup {i*j, i*j-1} F3(AE1,AE2,...,AE10) = AE2 \cap AE10 F4(AE1,AE2,...,AE10) = (AE3 - killAE([i*j-1 Powerset(Var* x Lab^?*} killRD([x:= a]^l) = killRD([skip]^l) = {} killRD([b]^l) = {} genRD: Blocks* -> Powerset(Var* x Lab^?*} genRD([x:= a]^l) = genRD([skip]^l) = {} genRD([b]^l) = {} ------------------------------------------ b. example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5) ------------------------------------------ What is Var* for this example? What is Lab*? for this example? ------------------------------------------ KILL AND GEN FOR THE EXAMPLE What are killRD and genRD in the example? l killRD(l) genRD(l) ==================================== 1 2 3 4 5 ------------------------------------------ ------------------------------------------ EXAMPLE'S RD EQUATIONS Recall that Var* is {a,i,j,k,n,t} RDentry(1) = RDexit(1) = RDentry(2) = RDexit(2) = RDentry(3) = RDexit(3) = RDentry(4) = RDexit(4) = RDentry(5) = RDexit(5) = ------------------------------------------ Are larger or smaller sets more precise? What is the bad outcome to avoid? So what sets do we start with to find a solution? So what would the solution be? ------------------------------------------ SOLUTION USING CHAOTIC ITERATION let RD1 = {}, ..., RD10 = {}; RD1 \neq F1(RD1,...,RD10) // entry(1) = {(a,?),(i,?),(j,?),(k,?),(n,?),(t,?)} so RD1 := {(a,?),(i,?),(j,?),(k,?), (n,?),(t,?)} RD2 \neq F2(RD1,...,RD10) // exit(1) = (RD1 - killRD([k := i*j-1]^1)) \cup genRD([k := i*j-1]^1) = ({(a,?),(i,?),(j,?),(k,?), (n,?),(t,?)}) - ({k} x Lab*?)) \cup {(k,1)} = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD2 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD3 \neq F3(RD1,...,RD10) // entry(2) = RD2 \cup RD10 = {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} \cup {} so RD3 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD4 \neq F4(RD1,...,RD10) // exit(2) = RD3 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD4 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD5 \neq F5(RD1,...,RD10) // entry(3) = RD4 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD5 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD6 \neq F6(RD1,...,RD10) // exit(3) = (RD5 - killRD([t := a+k]^3)) \cup genRD([t := a+k]^3) = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} - ({t} x Lab*?)) \cup {(t,3)} = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} so RD6 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,3)} RD7 \neq F7(RD1,...,RD10) // entry(4) = RD5 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} so RD7 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,3)} RD8 \neq F8(RD1,...,RD10) // exit(4) = (RD7 - killRD([j := j+1]^4)) \cup genRD([j := j+1]^4) = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} - ({j} x Lab*?)) \cup {(j,4)} = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} so RD8 := {(a,?),(i,?),(j,4),(k,1), (n,?),(t,3)} RD9 \neq F9(RD1,...,RD10) // entry(5) = RD8 = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} so RD9 := {(a,?),(i,?),(j,4),(k,1), (n,?),(t,3)} R10 \neq F10(RD1,...,RD10) // exit(5) = (RD9 - killRD([k := i*j-1]^5)) \cup gen([k := i*j-1]^5) = ({(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} - ({k} x Lab*?)) \cup {(k,5)} = {(a,?),(i,?),(j,4),(k,5),(n,?),(t,3)} ------------------------------------------ Are there more steps to take? ------------------------------------------ CHAOTIC ITERATION (CONTINUED) RD3 \neq F3(RD1,...,RD10) // entry(2) = RD2 \cup RD10 = {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} \cup {(a,?),(i,?),(j,4),(k,5), (n,?),(t,3)} so RD3 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD4 \neq F4(RD1,...,RD10) // exit(2) = RD3 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5)(n,?),(t,?),(t,3)} so RD4 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD5 \neq F5(RD1,...,RD10) // entry(3) = RD4 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} so RD5 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD6 \neq F6(RD1,...,RD10) // exit(3) = (RD5 - killRD([t := a+k]^3)) \cup genRD([t := a+k]^3) = ({(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} - ({t} x Lab*?)) \cup {(t,3)} = {(a,?),(i,?),(j,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD6 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} RD7 \neq F7(RD1,...,RD10) // entry(4) = RD6 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} so RD7 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} RD8 \neq F8(RD1,...,RD10) // exit(4) = (RD7 - killRD([j := j+1]^4)) \cup genRD([j := j+1]^4) = ({(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} - ({j} x Lab*?)) \cup {(j,4)} = {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD8 := {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} RD9 \neq F9(RD1,...,RD10) // entry(5) = RD8 = {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD9 := {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} ------------------------------------------ Now is it finished? So what is the fixed point? 3. Very Busy Expressions Analysis (2.1.3) a. idea and goals ------------------------------------------ VERY BUSY (VB) EXPRESSIONS ANALYSIS def (p. 44): A non-trivial expression e is *very busy* at exit from a block labeled l if, e must always be used before any x \in FV(e) is assigned. Examples: At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 5021]^14 ------------------------------------------ ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS Analysis question (p. 44): "For each program point, which [non-trivial] expressions must be very busy at the exit from that point." ------------------------------------------ When should expressions be very busy at the entry to a block? Would it make sense to have the analysis go forwards or backwards? Is this a "may" or a "must" analysis? What does that mean? Where do control flows join in this analysis? b. formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) killVB: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = Aexp(b) ------------------------------------------ Have we seen these kill and gen functions before? Does this analysis need isolated exits? If so, why? Do we want the largest or the smallest solution? c. example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4 Aexp* is { l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ What would change if the condition at label 1 did not use a+b? ------------------------------------------ FUNCTIONAL REPRESENTING THE EQUATIONS Recall Aexp* = {a+b} F(v) = (F1(v), F2(v), ..., F8(v)) F1(VB1,...,VB8) // entry(1) = (VB2 - killVB([a-b > a+b]^1) \cup gen([a-b > a+b]^1) = (VB2 - {}) \cup {a-b,a+b} = VB2 \cup {a-b,a+b} F2(VB1,...,VB8) // exit(1) = VB3 \cap VB5 F3(VB1,...,VB8) // entry(2) = (VB4 - killVB([x:=a+b]^2)) \cup genVB[x:=a+b]^2)) = (VB4 - {}) \cup {a+b} = VB4 \cup {a+b} F4(VB1,...,VB8) // exit(2) = VB7 F5(VB1,...,VB8) // entry(3) = (VB6 - killVB([y:=a+b]^3)) \cup genVB[y:=a+b]^3)) = (VB6 - {}) \cup {a+b} = VB6 \cup {a+b} F6(VB1,...,VB8) // exit(3) = VB7 F7(VB1,...,VB8) // entry(4) = (VB8 - killVB([z:=a]^4)) \cup genVB([z:=a]^4) = (VB8 - {}) \cup {} = VB8 F8(VB1,...,VB8) = {} // exit(4) ------------------------------------------ Are larger or smaller sets more precise? What is the bad outcome to avoid? So what sets do we start with to find a solution? ------------------------------------------ SOLVING USING CHAOTIC ITERATION Recall Aexp* = {a+b} Let VB1 = Aexp*, ..., VB8 = Aexp* VB8 \neq F8(VB1,...,VB8) // exit(4) = {} so VB8 := {} VB7 \neq F7(VB1,...,VB8) // entry(4) = VB8 = {} so VB7 := {} VB6 \neq F6(VB1,...,VB8) // exit(3) = VB7 = {} so VB6 := {} VB5 \neq F5(VB1,...,VB8) // entry(3) = VB6 \cup {a+b} = {} \cup {a+b} = {a+b} so VB5 := {a+b} VB4 \neq F4(VB1,...,VB8) // exit(2) = VB7 = {} so VB4 := {} VB3 \neq F3(VB1,...,VB8) // entry(2) = VB4 \cup {a+b} = {} \cup {a+b} = {a+b} so VB3 := {a+b} VB2 \neq F2(VB1,...,VB8) // exit(1) = VB3 \cap VB5 = {a+b} \cap {a+b} = {a+b} so VB2 := {a+b} ------------------------------------------ Now, does VB1 change? Can anything else change? 4. Live Variables Analysis (2.1.4) a. idea and goals ------------------------------------------ LIVE VARIABLES def (p. 47): A variable x is *live* at exit from label l iff there is a path from l to a possible use of x (in an expression). This assumes that no variables are live at the end of the program. Which variables are live at exit from 1? (A) [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 (B) [x := 3]^1; [y := x+2]^2; [y := y+1]^3 (C) [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ ------------------------------------------ LIVE VARIABLES (LV) ANALYSIS Analysis question (p. 47): "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ Which variables are live at exit of label 1? Label 2? What can we use this for? What is the bad outcome to avoid? What kinds of analysis information should we track in the analysis? Should this analysis go backwards or forwards? b. definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l = final(S*) then {} else LVentry(l) = kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = genLV([skip]^l) = {} genLV([b]^l) = ------------------------------------------ Does this analysis need isolated exits? If so, why? Why is there a union for LVexit? ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ ------------------------------------------ THE FUNCTIONAL F1(LV1,...,LV6) // entry(1) = (LV2 - killLV([x := 3]^1) \cup genLV([x := 3]^1) = (LV2 - {x}) \cup {} = LV2 - {x} F2(LV1,...,LV6) // exit(1) = LV3 F3(LV1,...,LV6) // entry(2) = (LV4 - killLV([z := 4]^2) \cup genLV([z := 4]^3) = (LV4 - {z}) \cup {} = LV4 - {z} F4(LV1,...,LV6) // exit(2) = LV5 F5(LV1,...,LV6) // entry(3) = (LV6 - killLV([x := z+2]^3) \cup genLV([x := z+2]^3) = (LV6 - {x}) \cup {z} F6(LV1,...,LV6) = {} // exit(3) ------------------------------------------ What makes the analysis information unsafe? What makes the analysis imprecise? Do we want the largest or the smallest solution? So, what would be the initial values for LV1,...,LV6? 5. Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var (in an expression) to its last assignment Definition-use (du) chain: links last assignment of var to a use (in an expression) ------------------------------------------ What might this be useful for? a. formal definitions ------------------------------------------ DEFINITIONS AND USES def (p., 50): (l1, ..., ln) is a *definition clear path for x* iff 1. no block labeled {l1, ..., l(n-1)} assigns a value to x, and 2. the block labeled ln uses x (as an expression) clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & l' = ln & n > 0 & (\forall i : 1 <= i < n : (li, l(i+1)) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ How do you interpret the notion of a definition clear path for x? Why are the def and use functions correct? Does clear(y, 3, 7) tell you anything about the use of y? ------------------------------------------ UD AND DU ANALYSIS ud: Var* x Lab* -> Powerset(Lab*?) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab*? -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ What does ud(x, l') = {l1, l2, l3} mean? What does du(x, l) = {l1, l2, l3} mean? Can ud(x,l') be empty? What would it mean if du(x,l) is empty? What is the analysis domain? Do these require isolated entries? Are these must or may analyses? So what would a bad outcome be for a UD or DU analysis? Would we want the largest or smallest solution for a UD or DU analysis? Can we define du in terms of ud? b. example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ======================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ c. computation How could we use RD and LV to compute ud chains? II. Monotone Frameworks (2.3) Can we identify some commonalities between different analyses? Would doing that help implement them? A. general pattern ------------------------------------------ GENERAL PATTERN A_o(l) = if l \in E then i else A_.(l) = where \bigsqcup is either \bigcup or \bigcap F is either flow(S*) or flow^R(S*) E is {init(S*)} or final(S*) i is initial/final information f(l) is the transfer function for blocks B^l \in blocks(S*) For a may analysis: \bigsqcup is For a must analysis: \bigsqcup is For a forward analysis: F is flow(S*) E is A_o gives the entry conditions A_. gives the exit conditions For a backward analysis: F is flow^R(S*) E is A_o gives the exit conditions A_. gives the entry conditions ------------------------------------------ B. basic definitions (2.3.1) 1. property space ------------------------------------------ PROPERTY SPACES def (p. 65): a *property space*, L = (L, \bigsqcup), is a set with \bigsqcup: Powerset(L) -> L a join operation (\sqcup) that makes it a complete lattice. Thus: l1 \sqcup l2 = \bigsqcup { l1, l2 } \bot = \bigsqcup {} l1 \sqsubseteq l2 = (l1 \sqcup l2 = l2) Examples: For reaching definitions: L = Powerset(Var* x Lab^?_*) \sqcup = \cup \sqsubseteq = \subseteq For available expressions: L = Powerset(AExp*) \sqcup = \cap \sqsubseteq = \supseteq ------------------------------------------ a. lattice theory i. partial orders, lub, joins, lattices ------------------------------------------ POSETS, LUBS, AND LATTICES (APPENDIX A) def: A *partially ordered set (poset)* is a pair (L, <=) of a set, L, together and a binary relation, <=, on L such that for all x,y,z in L: (reflexive) x <= x, (antisymmetric) x <= y and y <= x implies x = y, (transitive) x <= y and y <= z implies x <= z. def: for all x,y,u in L, u is an *upper bound of x and y* iff x <= u and y <= u. def: for all x,y,u in L, u is the *least upper bound of x and y* written lub(x,y) = u iff u is an upper bound of x and y and z such that z is an upper bound of x and y, u <= z. def: For all X, Y in L, the *join of X and Y*, is X \sqcup Y = lub(X,Y). This is also written as sup(X,Y). ------------------------------------------ does the lub of two elements x,y in L need to be in the set L? ------------------------------------------ LATTICES, COMPLETE LATTICES (2.3.1, A.1-2) def: A *lattice* is a poset in which each pair of elements has a least upper bound. def: A *complete lattice* is a lattice L such that every nonempty subset of L has a least upper bound (in L). ------------------------------------------ So, what is a property space? ii. meets, glbs ------------------------------------------ GLBS and COMPLETE LATTICES (APPENDIX A) Assume (L, <=) is a poset. def: for all x,y,n in L, n is a *lower bound of x and y* iff n <= x and n <= y. def: for all x,y,n in L, n is the *greatest lower bound of x and y* written glb(x,y) = n iff n is a lower bound of x and y and for all z such that z is a lower bound of x and y, z <= n. def: For all X, Y in L, the *meet of X and Y*, is X \sqcap Y = glb(X,Y). This is also written as inf(X,Y). Thm: If (L, \sqsubseteq) is a lattice, then the following hold for all x, y, z in L: (commutative) x \sqcup y = y \sqcup x, (associative) x \sqcup (y \sqcup z) = (x \sqcup y) \sqcup z, (absoption) x \sqcup (y \sqcap x) = (x \sqcup y) \sqcap x = x ------------------------------------------ b. chains and the ascending chain condition ------------------------------------------ CHAINS, ASCENDING CHAIN CONDITION (A.3) def: A subset Y of a poset (L,\sqsubseteq) is a *chain* iff for all l_1, l_2 in Y, l_1 \sqsubseteq l_2 or l_2 \sqsubseteq l_1 def: An *ascending chain* in L is a sequence (l_n)_{n \in N} such that each l_i in L and for all n,m <= N, n <= m ==> l_n \sqsubseteq l_m. def: a sequence (l_n)_{n \in N} *stabilizes* iff there is some number n0 such that for all n >= n0, l_n = l_n0. def: L *satisfies the ascending chain condition* iff every ascending chain in L eventually stabilizes. def: a *domain* is a lattice that satisfies the ascending chain condition. ------------------------------------------ If a chain is finite, does it stabilize? Is it necessary for lattice to be finite in order to satisfy the ascending chain condition? 2. transfer functions ------------------------------------------ TRANSFER FUNCTION SPACE def: Let L be a partially-ordered set. Then Funs is a *transfer function space for L* iff f \in Funs ==> f : L -> L and f is monotone, f,g \in Funs ==> f o g \in Funs, and id_L \in Funs. ------------------------------------------ 3. monotone framework ------------------------------------------ MONOTONE FRAMEWORK def: (L, Funs) is a monotone framework iff L is a property space and Funs is a transfer function space for L. def: (L, Funs, F, E, i, f_.) is an *instance of a monotone framework* if and only if: - (L, Funs) is a monotone framework, - F is a finite set of pairs (of flows), - E is a finite set of extremal labels, - i \in L is an extremal value, - f : (dom(F) \cup E) -> (L -> L) s.t. for l in (dom(F) \cup E) f_l \in Funs ------------------------------------------ C. examples (2.3.2) D. predicate abstraction (new topic) ------------------------------------------ PREDICATE ABSTRACTION Goal: verify program properties Idea: Use property space of the form L = Powerset(Preds) Preds = {P1, ..., Pn} where each Pi is a nullary predicate Interpretation: {P3,P5} means P3 and P5 may (must) be true (depending on kind of analysis) \sqcup is \cup (or \cap) \sqsubseteq is \subseteq (or \supseteq) Funs = monotonic (in \sqsubseteq) functions on L ------------------------------------------ For a may analysis, what's the bottom element? The top? How can you represent states? ------------------------------------------ PREDICATE ABSTRACTION EXAMPLE IsZero Analysis: At a given program point, which variables may be 0. "Abstract States" s \in L = Powerset(Preds) where Preds = {IsZero_x | x in Vars*} IsZero_y means y may be 0 F is flow(S*) E is {init(S*)} i is Preds fIZ(l) : L -> L, for l in Lab* fIZ(l)(s) = (s \ kill_IZ(B^l)(s)) \cup gen_IZ(B^l)(s) where B^l in blocks(S*) kill_IZ([x := a]^l)(s) = {IsZero_x} kill_IZ([skip]^l)(s) = {} kill_IZ([b]^l)(s) = {} gen_IZ([x := a]^l)(s) = {IsZero_x | (\exists cs \in \gamma(s) :: A[[a]]cs == 0)} gen_IZ([skip]^l)(s) = {} gen_IZ([b]^l)(s) = {} \gamma: L -> Store \gamma(s) = {cs | cs: Var* -> Int, IsZero_x \in s ==> cs(x) == 0} ------------------------------------------ What kind of analysis is this? Why this initial value i? What do the gen and kill functions do? What are the equations for IZ_entry(l) and IZ_exit(l)? ------------------------------------------ EXAMPLE [y := 3]^1; while [y>0]^2 do ([q := y-2]^3; [y := y-1]^4); [q := q div y]^5 Var* = {y, q} Preds = {IsZero_y, IsZero_q} IZ_entry(1) = IZ_exit(1) = IZ_entry(2) = IZ_exit(2) = IZ_entry(3) = IZ_exit(3) = IZ_entry(4) = IZ_exit(4) = IZ_entry(5) = IZ_exit(5) = ------------------------------------------ E. equation solving (2.4) 1. MFP (Maximal Fixed Point) solution (2.4.1) 2. MOP solution (2.4.2) (skip)