COP 5021 Lecture -*- Outline -*- * Intraprocedural Analysis (2.1) Several examples of data flow analysis for the WHILE language. Goals: - learn how to precisely define them = learn how to implement them Analyses defined by pairs of entry/exit functions, that map labels to analysis information (sets). The analysis operates on the control flow graph of the program... ** definitions and notation The point of these is to represent the control flow graph. These are all defined by structural induction. *** initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ ... init(S1) l l Q: What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? Q: How would we generalize the formalism to handle such statements? perhaps return a set of labels. ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ ... final(S2) final(S1) \cup final(S2) l Q: What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? the final set on each branch *** blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ Q: How would you define the set of elementary blocks in a statement? blocks([x := a]^l) = {[x := a]^l} blocks([skip]^l) = {[skip]^l} blocks(S1; S2) = blocks(S1) \cup blocks(S2) blocks(if [b]^l then S1 else S2) = {[b]^l} \cup blocks(S1) \cup blocks(S2) blocks(while [b]^l do S) = {[b]^l} \cup blocks(S) Q: What would the set of elementary blocks in an assert statement be? Q: What would be set of elementary blocks in a nondeterministic choice statement be? Q: How would you define the set of labels in a statement? labels: Stmt -> Powerset(Lab) labels(S) = {l | [B]^l \in blocks(S)} *** flows and reverse flows ------------------------------------------ FLOWS flow: Stmt -> Powerset(Lab x Lab) flow([x := a]^l) = {} flow([skip]^l) = {} flow(S1; S2) = flow(if [b]^l then S1 else S2) = flow(while [b]^l do S) = ------------------------------------------ ... // flow(S1; S2) = flow(S1) \cup flow(S2) \cup {(l, init(S2))| l \in final(S1)} // flow(if [b]^l then S1 else S2) = flow(S1) \cup flow(S2) \cup {(l,init(S1)} \cup {(l,init(S2)} // flow(while [b]^l do S) = flow(S) \cup {(l, init(S))} \cup {(l', l)| l' \in final(S)} Q: How would we use these functions to represent the nodes and edges of a control flow graph? Q: What are the nodes of the control flow graph? The elementary blocks Q: What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? The set of labels is {1, 2, 3} The set of flows is {(1,2), (1,3)} Q: How you formulate a set of reverse flows? flow^R: Stmt -> Powerset(Lab xLab) flow^R(S) = {(l', l) | (l, l') \in flow(S)} Q: What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? If this statement is S, then flow(S) = {(1,2),(1,3)} so flow^R(S) = {(2,1),(3,1)} Q: What is flow^R of (while [x <= 4]^1 do [x := x+1]^2); [assert x == 4]^3 ? If this statement is S, then flow(S) = {(1,2),(1,3),(2,1)} so flow^R(S) = {(2,1),(3,1),(1,2)} Q: What are the initial nodes of flow^R? the final nodes of flow *** program of interest In the book, the stars are subscripts, but the ascii version here uses them in line S* vs. S * ------------------------------------------ PROGRAM OF INTEREST S* = the top level statement Lab* = labels(S*) Var* = FV(S*) Blocks* = blocks(S*) ------------------------------------------ These are the: labels, variables, and elementary blocks in the program S* ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ ... (\forall l \in Lab :: (l, init(S*)) \not\in flow(S*)) ... (\forall l1 \in final(S*) :: (\forall l2 \in Lab :: (l1, l2) \not\in flow(S*))) Q: What kind of programs would not have isolated entries? Those that do not start with a while-loop. Q: What kind of programs would not have isolated exits? Those that do not end with a while-loop. Q: Could we convert programs to have both isolated entries and exits? Yes, add skip statements (to the beginning and end) as necessary So no loss of generality in assuming these ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S Formally: ------------------------------------------ ... have the same label. Q: How would you formalize that? [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2 Q: Is there any reason not to have label consistent programs? No, but it might be easier to not have it if joining 2 programs together ** Example analyses Now we're going to look at 4 different analyses, to illustrate the formalism and to see how to precisely define dataflow analyses. *** Available Expressions (AE) analysis (2.1.1) **** definitions: trivial and non-trivial expressions Q: What's a trivial expression? ------------------------------------------ SUBEXPRESSIONS def: An expression is *trivial* iff Aexp(a) = non-trivial arithmetic subexpressions of expression, a E.g., Aexp(-x + -y) = Aexp* = nontrivial arithmetic expressions in S* ------------------------------------------ ... is a single variable or a constant, in other words, a base case So, a nontrivial expression is an expression that is not trivial Note that this includes the entire expression (not proper subsexpression, just subexpressions) ... Aexp(-x + -y) = {-x, -y, -x+-y} **** idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS (AE) ANALYSIS Analysis question (p. 37): "For each program point, which [non-trivial] expressions must have already been computed, and not later modified, on all paths to that program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do { [t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5 } ------------------------------------------ Q: Why would a compiler writer care about non-trivial expressions? Can optimize if don't need to recompute them, so reusing them can save computation time Q: Why would trivial expressions not matter? Because they can be loaded into a register quickly, without any substantial computation Q: What non-trivial expressions are available at entry to block 2? i*j and (i*j)-1 Note that these are available on all paths, even going around the loop **** observations Q: What would be the bad outcome, which should be avoided? saying that some non-trivial expression is available when it might not be, i.e., taking its value out of a register when it is out of date Q: What makes a "solution" unsafe? too much in the set Q: What makes it imprecise? too little in the set Q: This is a forward analysis, Why? because the computation happens forward, along paths so use flow, not flow^R Q: Note the use of the word "must", what impact does that have? makes us use: - intersections of entry information, - want the largest set that is a solution (fixed point), because smaller sets are imprecise **** formalization The basic idea is to define functions on each kind of elementary block, using two sub-functions: gen and kill... ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) - killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = genAE([skip]^l) = {} genAE([b]^l) = ------------------------------------------ Q: What does the kill function mean? It says what expressions are no longer available, since they were assigned (what to take out of the analysis). Q: What does the gen function mean? It says what expressions become available. ... killAE([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} (this is the set of nontrivial expressions that contain the variable being assigned as a free variable) ... genAE([x:= a]^l) = {a' \in Aexp(a) | x \not\in FV(a')} (so this is the non-trivial expressions in the right hand side that do not contain the assigned variable as a free variable) ... genAE([b]^l) = Aexp(b) Q: Why don't we have to define the analysis for while loops and if statements? it's implicit, using the CFG **** example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5) What's the control flow graph? ------------------------------------------ ... [k := i*j-1]^1 | v /---->[i*j-1 < n]^2 ---------\ | | | | v | | [t := a+k]^3; | | | | | v | | [j := j+1]^4; | | | | | v | | [k := i*j-1]^5 | | | | \---------/ | v Q: What is Aexp* in this example? {i*j, i*j-1, a+k, j+1} Q: What's the role of the control flow graph here? it's an implicit parameter, so these equations generate particular sets for every control flow graph (program). ------------------------------------------ KILL AND GEN FOR THE EXAMPLE What are killAE and genAE in the example? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ ... l killAE(l) genAE(l) ================================ 1 {a+k} {i*j, i*j-1} 2 {} {i*j, i*j-1} 3 {} {a+k} 4 {i*j, i*j-1,j+1} {} 5 {a+k} {i*j, i*j-1} (note that j+1 is not generated at block 4) Q: Do the kill and gen functions depend on what happens at other nodes? No; this allows us to compute them just seeing the bolck itself ------------------------------------------ EXAMPLE'S AE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ ... AEentry(1) = {} AEentry(2) = AEexit(1) \cap AEexit(5) AEentry(3) = AEexit(2) AEentry(4) = AEexit(3) AEentry(5) = AEexit(4) AEexit(1) = ({} - {a+k}) \cup {i*j, i*j-1} AEexit(2) = (Aentry(2) - killAE([i*j-1 < n]^2) \cup genAE([i*j-1 < n]^2) = (Aentry(2) - {}) \cup {i*j, i*j-1} AEexit(3) = (AEentry(3) - killAE([t:=a+k]^3)) \cup genAE([t:=a+k]^3) = (AEentry(3) - {}) \cup {a+k} AEexit(4) = (AEentry(4) - killAE([j:=j+1]^4)) \cup genAE([j:=j+1]^4) = (AEentry(4) - {i*j, i*j-1, j+1}) \cup {} AEexit(5) = (AEentry(5) - killAE([k := i*j-1]^5)) \cup genAE([k := i*j-1]^5) = (AEentry(5) - {a+k}) \cup {i*j, i*j-1} Q: What are we assuming with this formalism? That the program is label consistent. Why? because of the use of blocks(S*) That the program has isolated entries. Why? because of the use of l == init(S*) Q: How would we adjust this if we didn't have isolated entries? (1) put a skip at the beginning of the program. (2) simulate the effect of that, by intersecting {} with other exits Q: What's the bad outcome we want to avoid? That there is an expression that is said to be available, but is not on all paths to a program point (because the compiler might try to get it from a register incorrectly) Q: What makes the analysis information more imprecise? a smaller set Q: What makes the analysis unsafe? a set that is too large. Q: So what do we want for a solution the largest or smallest? the largest solution, as that is the most precise ***** Finding a Solution using Chaotic Iteration (1.7) Q: How can we get rid of the recursion in the dataflow equations? Make it an iteration by using the generating functional (see below) Q: What sets do use to start iterating from to find a solution? Aexp*, because this is a must (intersection) analysis and (thus) we want the largest solution ------------------------------------------ FORMALIZING THE ANALYSIS AS AN ITERATION Let \vec{AE} be (AE1, AE2, ..., AE10) where: AE1 is the entry info for label 1 from the previous iteration AE2 is the exit info for label 1 from the previous iteration AE3 is entry info. for label 2 from the previous iteration AE4 is exit info for label 2 from the previous iteration AE5 is entry info for label 3 AE6 is exit info for label 3 ... AE10 is exit info for label 5 define F(v) = (F1(v), F2(v), ..., F10(v)) where: F1(AE1, AE2, ..., AE10) // entry(1) = {} F2(AE1, AE2, ..., AE10) // exit(1) = (AE1 - killAE([k:=i*j-1]^1)) \cup genAE([k:=i*j-1]^1) = (AE1 - {a+k}) \cup {i*j,i*j-1} F3(AE1, AE2, ..., AE10) // entry(2) = AE2 \cap AE10 F4(AE1, AE2, ..., AE10) // exit(2) = (AE3 - killAE([i*j-1 < n]^2)) \cup genAE([i*j-1 < n]^2) = (AE3 - {}) \cup {i*j,i*j-1} F5(AE1, AE2, ..., AE10) // entry(3) = AE4 F6(AE1, AE2, ..., AE10) // exit(3) = (AE5 - killAE([t := a+k]^3)) \cup genAE([t := a+k]^3) = (AE5 - {}) \cup {a+k} F7(AE1, AE2, ..., AE10) // entry(4) = AE6 F8(AE1, AE2, ..., AE10) // exit(4) = (AE7 - killAE([j := j+1]^4)) \cup genAE([j := j+1]^4) = (AE7 - {i*j,i*j-1,j+1}) \cup {} F9(AE1, AE2, ..., AE10) // entry(5) = AE8 F10(AE1, AE2, ..., AE10) // exit(5) = (AE9 - killAE([k := i*j-1]^5)) \cup genAE([k := i*j-1]^5) = (AE9 - {a+k}) \cup {i*j, i*j-1} Then a fixed-point of F is a solution. ------------------------------------------ ------------------------------------------ CORRECTNESS OF ITERATION Def 0 (\sqsubseteq): tuples of available expressions are ordered pointwise by superset inclusion: i.e., (v1,...,v10) \sqsubseteq (v1',...,v10') iff v1 \supseteq v1' and ... and v10 \supseteq v10' Exercise 1: F is monotonic (in \sqsubseteq): i.e., if v \sqsubseteq v' then F(v) \sqsubseteq F(v'). Corollary 2: Since (Aexp*,...,Aexp*) \sqsubseteq (Aexp*,...,Aexp*), F(Aexp*,...,Aexp*) \sqsubseteq F(Aexp*,...,Aexp*) Corollary 3: (Aexp*,...,Aexp*) \sqsubseteq F(Aexp*,...,Aexp*) Exercise 4: For all natural numbers n: (Aexp*,...,Aexp*) \sqsubseteq F^n(Aexp*,...,Aexp*) Exercise 5: there is some natural number n such that F^n(Aexp*,...,Aexp*) = F(F^n(Aexp*,...,Aexp*)) (so that F^n(Aexp*,...,Aexp*) is a fixed point of F). ------------------------------------------ Q: Why do these hold? Ex 1 can be checked directly on the definition of F. Corr. 2 and 3 follow immediately. Ex. 4 is proven by induction, using monotonicity of F. Ex. 5 holds becuase the lattice is finite, so there cannot be an infinite ascending chain ------------------------------------------ USING THE ITERATION DIRECTLY Iterate F, starting with (Aexp*, Aexp*, ..., Aexp*) at each iteration get a better approximation (smaller set) until reach a fixed point First iteration: F(Aexp*,Aexp*,...,Aexp*) = (F1(Aexp*), F2(Aexp*), ..., F10(Aexp*)) F1(Aexp*,Aexp*,...,Aexp*) = {} F2(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} F3(Aexp*,Aexp*,...,Aexp*) = Aexp* \cap Aexp* = Aexp* F4(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {}) \cup {i*j,i*j-1} = Aexp* F5(Aexp*,Aexp*,...,Aexp*) = Aexp* F6(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {}) \cup {a+k} = Aexp* F7(Aexp*,Aexp*,...,Aexp*) = Aexp* F8(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(Aexp*,Aexp*,...,Aexp*) = Aexp* F10(Aexp*,Aexp*,...,Aexp*) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} ------------------------------------------ Recall that Aexp* is {i*j, i*j-1, a+k, j+1} ------------------------------------------ SECOND ITERATION Let \vec{v_1} // output of first iteration = ({}, {i*j, i*j-1, j+1}, Aexp*, Aexp*, Aexp*, Aexp*, Aexp*, {a+k}, Aexp*, {i*j, i*j-1, j+1}) The second iteration computes: F1(\vec{v_1}) = {} F2(\vec{v_1}) = ({} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} F3(\vec{v_1}) = {i*j, i*j-1, j+1} \cap {i*j, i*j-1, j+1} = {i*j, i*j-1, j+1} F4(\vec{v_1}) = (Aexp* - {}) \cup {i*j,i*j-1} = Aexp* F5(\vec{v_1}) = Aexp* F6(\vec{v_1}) = (Aexp* - {}) \cup {a+k} = Aexp* F7(\vec{v_1}) = Aexp* F8(\vec{v_1}) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(\vec{v_1}) = {a+k} F10(\vec{v_1}) = (Aexp* - {a+k}) \cup {i*j, i*j-1} = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1} = {i*j, i*j-1, j+1} ------------------------------------------ Q: Compared to the inputs to the second iteration, did any outputs change? Yes, the 2nd, 3rd, and 9th elements changed (from Aexp* to something smaller) ------------------------------------------ THIRD ITERATION Let \vec{v_2} // output of 2nd iteration = ({}, {i*j,i*j-1}, {i*j,i*j-1,j+1}, Aexp*, Aexp*, Aexp*, Aexp*, {a+k}, {a+k}, {i*j, i*j-1, j+1}) The third iteration computes: F1(\vec{v_2}) = {} F2(\vec{v_2}) = ({} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} F3(\vec{v_2}) = {i*j, i*j-1} \cap {i*j, i*j-1, j+1} = {i*j, i*j-1} F4(\vec{v_2}) = ({i*j, i*j-1, j+1} - {}) \cup {i*j,i*j-1} = {i*j, i*j-1, j+1} F5(\vec{v_2}) = Aexp* F6(\vec{v_2}) = (Aexp* - {}) \cup {a+k} = Aexp* F7(\vec{v_2}) = Aexp* F8(\vec{v_2}) = (Aexp* - {i*j,i*j-1,j+1}) \cup {} = {a+k} F9(\vec{v_2}) = {a+k} F10(\vec{v_2}) = ({a+k} - {a+k}) \cup {i*j, i*j-1} = {i*j, i*j-1} ------------------------------------------ Q: What changed in the third iteration? The 3rd, 4th, 9th, and 10th places Q: Is this a lot of bookkeeping? yes! Q: Is there a more efficient way to do this? Yes, use the chaotic iteration algorithm... ------------------------------------------ CHAOTIC ITERATION ALGORITHM (1.7) Goal: compute the desired fixed point of dataflow equations (quickly) Idea: - use a functional (F =(Fi), as before) to make the current approximation more and more precise - initialize the argument vector to be either: - a vector of empty sets, or - a vector of complete sets depending on if want to find the - least fixed point, or - greatest fixed point For our example in the AE analysis, we want greatest fixed point, so: Algorithm (Table 1.5): Step 1: initialize AEi := Aexp* for i in {1,...,10} Step 2: while \exists j such that AEj \neq Fj(AE1, ..., AE10) do let j be such that AEj \neq Fj(AE1, ..., AE10) in AEj := Fj(AE1, ..., AE10) Step 3: return (AE1, ..., AE10) ------------------------------------------ Q: Why is this algorithm correct? Because it simulates the iteration. In detail, we can always construct a step of the iteration of F from one or more steps of the Chaotic Iteration algorithm. Suppose v \neq F(v) = o. Then there is is some non-empty set of indexes, D, of o such that for all j in D, v[j] \neq o[j] = Fj(v). The Chaotic Iteration algorithm can compute each component o[j] which can then be used to form o = F(v). The algorithm finds a fixed point, because when there is no j such that v \neq Fj(v), then by definition v = F(v). The algorithm terminates because the lattice (Aexp*^10, \sqsubseteq) is finite. ------------------------------------------ RECALL THE FUNCTIONAL FOR THE EXAMPLE Recall that F(v) = (F1(v),...,F10(v)) where F1(AE1,AE2,...,AE10) = {} F2(AE1,AE2,...,AE10) = (AE1 - killAE([k:= i*j-1]^1)) \cup genAE([k:= i*j-1]^1) = (AE1 - {a+k}) \cup {i*j, i*j-1} F3(AE1,AE2,...,AE10) = AE2 \cap AE10 F4(AE1,AE2,...,AE10) = (AE3 - killAE([i*j-1 Powerset(Var* x Lab^?*} killRD([x:= a]^l) = killRD([skip]^l) = {} killRD([b]^l) = {} genRD: Blocks* -> Powerset(Var* x Lab^?*} genRD([x:= a]^l) = genRD([skip]^l) = {} genRD([b]^l) = {} ------------------------------------------ ... RDentry(l) = if l = init(S*) then {(v,?) | v in Var*} else \bigcup {RDexit(l') | (l',l) \in flow(S*)} ... killRD([x := a]^l = {(x,l') | l' in Lab*?} = {x} x Lab*? ... genRD([x := a]^l) = {(x,l)} **** example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5) ------------------------------------------ Q: What is Var* for this example? {k, i, j, t, a} Q: What is Lab*? for this example? {?, 1, 2, 3, 4, 5} ------------------------------------------ KILL AND GEN FOR THE EXAMPLE What are killRD and genRD in the example? l killRD(l) genRD(l) ==================================== 1 2 3 4 5 ------------------------------------------ ... l killRD(l) genRD(l) ================================ 1 {(k,l)|l in Lab*?} {(k,1)} 2 {} {} 3 {(t,l)|l in Lab*?} {(t,3)} 4 {(j,l)|l in Lab*?} {(j,4)} 5 {(k,l)|l in Lab*?} {(k,5)} ------------------------------------------ EXAMPLE'S RD EQUATIONS Recall that Var* is {a,i,j,k,n,t} RDentry(1) = RDexit(1) = RDentry(2) = RDexit(2) = RDentry(3) = RDexit(3) = RDentry(4) = RDexit(4) = RDentry(5) = RDexit(5) = ------------------------------------------ ... RDentry(1) = Var* x {?} = {(a,?), (i,?), (j,?), (k,?), (n,?), (t,?)} RDexit(1) = (RDentry(1) - killRD([k := i*j-1]^1)) \cup genRD([k := i*j-1]^1) = (RDentry(1) - ({k} x Lab*?)) \cup {(k,1)} RDentry(2) = RDexit(1) \cup RDexit(5) RDexit(2) = RDentry(2) RDentry(3) = RDexit(2) Dexit(3) = (RDentry(3) - killRD([t := a+k]^3)) \cup genRD([t := a+k]^3) RDentry(4) = RDexit(3) RDexit(4) = (RDentry(4) - killRD([j := j+1]^4)) \cup genRD([j := j+1]^4) RDentry(5) = RDexit(4) RDexit(5) = (RDentry(5) - killRD(5)) \cup genRD(5) Q: Are larger or smaller sets more precise? smaller sets! Q: What is the bad outcome to avoid? A set that is too small, so warn user wrongly... Q: So what sets do we start with to find a solution? {}, because this is a may (union) analysis and we want the least (most precise) solution Q: So what would the solution be? ------------------------------------------ SOLUTION USING CHAOTIC ITERATION let RD1 = {}, ..., RD10 = {}; RD1 \neq F1(RD1,...,RD10) // entry(1) = {(a,?),(i,?),(j,?),(k,?),(n,?),(t,?)} so RD1 := {(a,?),(i,?),(j,?),(k,?), (n,?),(t,?)} RD2 \neq F2(RD1,...,RD10) // exit(1) = (RD1 - killRD([k := i*j-1]^1)) \cup genRD([k := i*j-1]^1) = ({(a,?),(i,?),(j,?),(k,?), (n,?),(t,?)}) - ({k} x Lab*?)) \cup {(k,1)} = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD2 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD3 \neq F3(RD1,...,RD10) // entry(2) = RD2 \cup RD10 = {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} \cup {} so RD3 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD4 \neq F4(RD1,...,RD10) // exit(2) = RD3 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD4 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD5 \neq F5(RD1,...,RD10) // entry(3) = RD4 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} so RD5 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} RD6 \neq F6(RD1,...,RD10) // exit(3) = (RD5 - killRD([t := a+k]^3)) \cup genRD([t := a+k]^3) = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)} - ({t} x Lab*?)) \cup {(t,3)} = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} so RD6 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,3)} RD7 \neq F7(RD1,...,RD10) // entry(4) = RD5 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} so RD7 := {(a,?),(i,?),(j,?),(k,1), (n,?),(t,3)} RD8 \neq F8(RD1,...,RD10) // exit(4) = (RD7 - killRD([j := j+1]^4)) \cup genRD([j := j+1]^4) = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)} - ({j} x Lab*?)) \cup {(j,4)} = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} so RD8 := {(a,?),(i,?),(j,4),(k,1), (n,?),(t,3)} RD9 \neq F9(RD1,...,RD10) // entry(5) = RD8 = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} so RD9 := {(a,?),(i,?),(j,4),(k,1), (n,?),(t,3)} R10 \neq F10(RD1,...,RD10) // exit(5) = (RD9 - killRD([k := i*j-1]^5)) \cup gen([k := i*j-1]^5) = ({(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)} - ({k} x Lab*?)) \cup {(k,5)} = {(a,?),(i,?),(j,4),(k,5),(n,?),(t,3)} ------------------------------------------ Q: Are there more steps to take? Yes, RD3 will change based on the new value of RD10... ------------------------------------------ CHAOTIC ITERATION (CONTINUED) RD3 \neq F3(RD1,...,RD10) // entry(2) = RD2 \cup RD10 = {(a,?),(i,?),(j,?),(k,1), (n,?),(t,?)} \cup {(a,?),(i,?),(j,4),(k,5), (n,?),(t,3)} so RD3 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD4 \neq F4(RD1,...,RD10) // exit(2) = RD3 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5)(n,?),(t,?),(t,3)} so RD4 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD5 \neq F5(RD1,...,RD10) // entry(3) = RD4 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} so RD5 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} RD6 \neq F6(RD1,...,RD10) // exit(3) = (RD5 - killRD([t := a+k]^3)) \cup genRD([t := a+k]^3) = ({(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,?),(t,3)} - ({t} x Lab*?)) \cup {(t,3)} = {(a,?),(i,?),(j,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD6 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} RD7 \neq F7(RD1,...,RD10) // entry(4) = RD6 = {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} so RD7 := {(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} RD8 \neq F8(RD1,...,RD10) // exit(4) = (RD7 - killRD([j := j+1]^4)) \cup genRD([j := j+1]^4) = ({(a,?),(i,?),(j,?),(j,4),(k,1), (k,5),(n,?),(t,3)} - ({j} x Lab*?)) \cup {(j,4)} = {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD8 := {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} RD9 \neq F9(RD1,...,RD10) // entry(5) = RD8 = {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} so RD9 := {(a,?),(i,?),(j,4),(k,1),(k,5), (n,?),(t,3)} ------------------------------------------ Q: Now is it finished? Yes, RD10 doesn't change (since the k pairs are subtracted first; check that!), so RD2 doesn't change... and nothing else does Q: So what is the fixed point? read it out from the last values of each RDj in the chaotic iteration... *** Very Busy Expressions Analysis (2.1.3) **** idea and goals This can be useful in hoisting expressions: i.e., evaluating an expression and storing it for later use (e.g., in a register) Doing this can save space in the generated code (making it use the cache more efficiently) ------------------------------------------ VERY BUSY (VB) EXPRESSIONS ANALYSIS def (p. 44): A non-trivial expression e is *very busy* at exit from a block labeled l if, e must always be used before any x \in FV(e) is assigned. Examples: At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 5021]^14 ------------------------------------------ ... exits of 1, 3, 7, 9 (not 12 or any of the others) Note: "an expression is very busy at exit from a block if it is very busy at the entry to every block that follows." "However, no expressions are very busy at the exit from any final block" ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS Analysis question (p. 44): "For each program point, which [non-trivial] expressions must be very busy at the exit from that point." ------------------------------------------ Q: When should expressions be very busy at the entry to a block? The (non-trivial) expressions that are used in it. Q: Would it make sense to have the analysis go forwards or backwards? Backwards: want to accumulate information about uses and pass that backwards along control flows to where expressions are computed. Q: Is this a "may" or a "must" analysis? What does that mean? It's a must analysis, because of the "always" in the definition of "very busy," so we intersect information when control flows join Q: Where do control flows join in this analysis? At exits, since we are working backwards, this would be labels where in a forward analysis the analysis would split **** formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) killVB: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = Aexp(b) ------------------------------------------ ... VBexit(l) = if l \in final(S*) then {} else \bigcap { VBentry(l') | (l', l) \in flow^R(S*) } ... VBentry(l) = (VBexit(l) - killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) ... killVB([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} ... genVB([x:= a]^l) = Aexp(a) ... genVB([b]^l) = Aexp(b) Q: Have we seen these kill and gen functions before? Yes, for the AE analysis! Q: Does this analysis need isolated exits? If so, why? Yes, because of the initial condition for VBexit(l) Q: Do we want the largest or the smallest solution? largest, because smaller is imprecise **** example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4 Aexp* is { l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ Q: What would change if the condition at label 1 did not use a+b? The expression a+b would still be very busy at (exit from) 2 and 3 ... Aexp* is {a-b, a+b} ... l killVB(l) genVB(l) ======================= 1 {} {a-b,a+b} 2 {} {a+b} 3 {} {a+b} 4 {} {} ------------------------------------------ FUNCTIONAL REPRESENTING THE EQUATIONS Recall Aexp* = {a+b} F(v) = (F1(v), F2(v), ..., F8(v)) F1(VB1,...,VB8) // entry(1) = (VB2 - killVB([a-b > a+b]^1) \cup gen([a-b > a+b]^1) = (VB2 - {}) \cup {a-b,a+b} = VB2 \cup {a-b,a+b} F2(VB1,...,VB8) // exit(1) = VB3 \cap VB5 F3(VB1,...,VB8) // entry(2) = (VB4 - killVB([x:=a+b]^2)) \cup genVB[x:=a+b]^2)) = (VB4 - {}) \cup {a+b} = VB4 \cup {a+b} F4(VB1,...,VB8) // exit(2) = VB7 F5(VB1,...,VB8) // entry(3) = (VB6 - killVB([y:=a+b]^3)) \cup genVB[y:=a+b]^3)) = (VB6 - {}) \cup {a+b} = VB6 \cup {a+b} F6(VB1,...,VB8) // exit(3) = VB7 F7(VB1,...,VB8) // entry(4) = (VB8 - killVB([z:=a]^4)) \cup genVB([z:=a]^4) = (VB8 - {}) \cup {} = VB8 F8(VB1,...,VB8) = {} // exit(4) ------------------------------------------ Q: Are larger or smaller sets more precise? larger sets Q: What is the bad outcome to avoid? A set that is too large, so optimize away something not very busy Q: So what sets do we start with to find a solution? Aexp*, because this is a must (intersection) analysis and we want the greatest (most precise) solution ------------------------------------------ SOLVING USING CHAOTIC ITERATION Recall Aexp* = {a+b} Let VB1 = Aexp*, ..., VB8 = Aexp* VB8 \neq F8(VB1,...,VB8) // exit(4) = {} so VB8 := {} VB7 \neq F7(VB1,...,VB8) // entry(4) = VB8 = {} so VB7 := {} VB6 \neq F6(VB1,...,VB8) // exit(3) = VB7 = {} so VB6 := {} VB5 \neq F5(VB1,...,VB8) // entry(3) = VB6 \cup {a+b} = {} \cup {a+b} = {a+b} so VB5 := {a+b} VB4 \neq F4(VB1,...,VB8) // exit(2) = VB7 = {} so VB4 := {} VB3 \neq F3(VB1,...,VB8) // entry(2) = VB4 \cup {a+b} = {} \cup {a+b} = {a+b} so VB3 := {a+b} VB2 \neq F2(VB1,...,VB8) // exit(1) = VB3 \cap VB5 = {a+b} \cap {a+b} = {a+b} so VB2 := {a+b} ------------------------------------------ Q: Now, does VB1 change? No, it's VB2 \cup{a-b,a+b} = {a+b} \cup {a-b,a+b} = {a-b,a+b} = Aexp*, so it's the same! Q: Can anything else change? no, so we have computed the fixed point of this example. *** Live Variables Analysis (2.1.4) **** idea and goals ------------------------------------------ LIVE VARIABLES def (p. 47): A variable x is *live* at exit from label l iff there is a path from l to a possible use of x (in an expression). This assumes that no variables are live at the end of the program. Which variables are live at exit from 1? (A) [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 (B) [x := 3]^1; [y := x+2]^2; [y := y+1]^3 (C) [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ {x} in (A), since q is not used by 1 {} in (B), since neither x nor y will be used in 1 {} in (C), while x is live at exit from 3 The live variables at exit of l are the variables that may be used in expressions after l ------------------------------------------ LIVE VARIABLES (LV) ANALYSIS Analysis question (p. 47): "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ Q: Which variables are live at exit of label 1? Label 2? {} and {z} The authors take the view "and that no variables are live at the end of the program" however they remark that "for some applications it might be better to assume that all variables are live at the end of the program." Q: What can we use this for? dead code elimination: if a variable is not live at exit from a block that assigns to it, then the assignment statement is "dead" and can be deleted from the program. Q: What is the bad outcome to avoid? deleting some assignment statement that might be needed. Q: What kinds of analysis information should we track in the analysis? sets of variable names (Powerset(Var*)) that are live Q: Should this analysis go backwards or forwards? Backwards, so use information can propagate back to earlier blocks **** definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l = final(S*) then {} else LVentry(l) = kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = genLV([skip]^l) = {} genLV([b]^l) = ------------------------------------------ ... LVexit(l) = if l in finals(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } ... LVentry(l) = (LVexit(l) - killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) ... killLV([x:= a]^l) = {x} ... genLV([x:= a]^l) = FV(a) ... genLV([b]^l) = FV(b) Q: Does this analysis need isolated exits? If so, why? Because of the initial condition on final(S*) Q: Why is there a union for LVexit? Because this is a may analysis ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ ... l killLV(l) genLV(1) ========================== 1 {x} {} 2 {z} {} 3 {x} {z} ------------------------------------------ THE FUNCTIONAL F1(LV1,...,LV6) // entry(1) = (LV2 - killLV([x := 3]^1) \cup genLV([x := 3]^1) = (LV2 - {x}) \cup {} = LV2 - {x} F2(LV1,...,LV6) // exit(1) = LV3 F3(LV1,...,LV6) // entry(2) = (LV4 - killLV([z := 4]^2) \cup genLV([z := 4]^3) = (LV4 - {z}) \cup {} = LV4 - {z} F4(LV1,...,LV6) // exit(2) = LV5 F5(LV1,...,LV6) // entry(3) = (LV6 - killLV([x := z+2]^3) \cup genLV([x := z+2]^3) = (LV6 - {x}) \cup {z} F6(LV1,...,LV6) = {} // exit(3) ------------------------------------------ Q: What makes the analysis information unsafe? a set that is too small Q: What makes the analysis imprecise? bigger sets Q: Do we want the largest or the smallest solution? smallest, as that will be the most precise solution Q: So, what would be the initial values for LV1,...,LV6? {}, the empty set, since we want the smallest solution as that is the most precise *** Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var (in an expression) to its last assignment Definition-use (du) chain: links last assignment of var to a use (in an expression) ------------------------------------------ Q: What might this be useful for? - dead code elimination (if the set of du chains is empty) - code motion (reordering) (can reorder statements S1 and S2 if there is no du or ud chain between S1 and S2) **** formal definitions ------------------------------------------ DEFINITIONS AND USES def (p., 50): (l1, ..., ln) is a *definition clear path for x* iff 1. no block labeled {l1, ..., l(n-1)} assigns a value to x, and 2. the block labeled ln uses x (as an expression) clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & l' = ln & n > 0 & (\forall i : 1 <= i < n : (li, l(i+1)) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ Q: How do you interpret the notion of a definition clear path for x? Q: Why are the def and use functions correct? Q: Does clear(y, 3, 7) tell you anything about the use of y? yes, it's used at label 7 ------------------------------------------ UD AND DU ANALYSIS ud: Var* x Lab* -> Powerset(Lab*?) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab*? -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ Q: What does ud(x, l') = {l1, l2, l3} mean? that a use of x at the block labeled l' might have been defined at either l1, l2, or l3 Q: What does du(x, l) = {l1, l2, l3} mean? that x is defined (assigned a value) at the block labeled l and that assignment to x might be used at either l1, l2, or l3 Q: Can ud(x,l') be empty? no, but it could just be {?} What would that mean? that x is used uninitialized at l' Q: What would it mean if du(x,l) is empty? That the assignment statement at l is dead code Q: What is the analysis domain? It's sets -- not sequences -- of labels that are on definition-clear paths Q: Do these require isolated entries? yes Q: Are these must or may analyses? may Q: So what would a bad outcome be for a UD or DU analysis? a set that is too small Q: Would we want the largest or smallest solution for a UD or DU analysis? the smallest, as it would be more precise Q: Can we define du in terms of ud? yes (this is an exercise in the book) **** example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ======================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ ... ud(x, l) l \ x | y z ======================== 1 {} {} 2 {?} {} 3 {} {1} 4 {?} {} du(x, l) l \ x | y z ======================== 1 {} {3} 2 {} {} 3 {} {} 4 {} {} **** computation Q: How could we use RD and LV to compute ud chains? see p. 54 UD : Var* x Lab* -> Powerset(Lab*) UD(x,l) = if x \in genLV(B^l) then {l' | (x,l') \in RDentry(l)} else {} computing DU is an exercise.