Com S 641 Lecture -*- Outline -*- * Intraprocedural Analysis (2.1) Several examples of data flow analysis for the While language. Analyses defined by pairs of entry/exit functions, that map labels to analysis information (sets) ** definitions and notation these are all defined by structural induction *** initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ ... init(S1) l l Q: What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? Q: How would we generalize the formalism to handle such statements? perhaps return a set of labels. ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ ... final(S2) final(S1) \cup final(S2) l Q: What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? *** blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ Q: How would you define the set of elementary blocks in a statement? blocks([x := a]^l) = {[x := a]^l} blocks([skip]^l) = {[skip]^l} blocks(S1; S2) = blocks(S1) \cup blocks(S2) blocks(if [b]^l then S1 else S2) = {[b]^l} \cup blocks(S1) \cup blocks(S2) blocks(while [b]^l do S) = {[b]^l} \cup blocks(S) Q: What would the set of blocks in an assert statement be? Q: What would be set of blocks in a nondeterministic choice statement be? Q: How would you define the set of labels in a statement? labels: Stmt -> Powerset(Lab) labels(S) = {l | [B]^l \in blocks(S)} *** flows and reverse flows ------------------------------------------ FLOWS flows: Stmt -> Powerset(Lab x Lab) flows([x := a]^l) = {} flows([skip]^l) = {} flows(S1; S2) = flows(if [b]^l then S1 else S2) = flows(while [b]^l do S) = ------------------------------------------ ... flows(S1) \cup flows(S2) \cup {(l, init(S2))| l \in final(S1)} flows(S1) \cup flows(S2) \cup {(l,init(S1)} \cup {(l,init(S2)} flows(S) \cup {(l, init(S))} \cup {(l', l)| l' \in final(S)} Q: How would we use these functions to represent the nodes and edges of a dataflow graph? Q: What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? Q: How you formulate a set of reverse flows? flow^R: Stmt -> Powerset(Lab xLab) flow^R(S) = {(l', l) | (l, l') \in flow(S)} Q: What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? Q: What are the initial nodes of flow^R? *** program of interest ------------------------------------------ PROGRAM OF INTEREST S = the top level statement * Lab = labels(S ) * * Var = FV(S ) * * Blocks = blocks(S ) * * AExp = nontrivial arithmetic expressions * in S * ------------------------------------------ Q: What's a trivial expression? a single variable or a constant, in other words, a base case ------------------------------------------ SUBEXPRESSIONS AExp(a) = non-trivial arithmetic subexpressions of a AExp(b) = non-trivial arithmetic subexpressions of b ------------------------------------------ ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ ... (\forall l \in Lab :: (l, init(S*)) \not\in flow(S*)) ... (\forall l1 \in final(S*) :: (\forall l2 \in Lab :: (l1, l2) \not\in flow(S*))) Q: What kind of programs would not have isolated entries? Those that do not start with a while-loop. Q: What kind of programs would not have isolated exits? Those that do not end with an if-then-else statement or while.... ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S have the same label. ------------------------------------------ Q: How would you formalize that? [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2 Q: Is there any reason not to have label consistent programs? ** available expressions analysis (2.1.1) *** idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS ANALYSIS "For each program point, which expressions must have already been computed, and not later modified, on all paths to the program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) ------------------------------------------ Q: What expressions are availabe at entry to block 2? *** formalization The basic idea is to define functions on each elementary block, using two sub-functions: gen and kill... ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) \ killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = {a' \in Aexp(a) | x \not\in FV(a')} genAE([skip]^l) = {} genAE([b]^l) = AExp(b) ------------------------------------------ Q: What's the role of the dataflow graph here? it's an implicit parameter, so these equations generate particular sets for every dataflow graph (program). Q: What does the kill function mean? It says what expressions are no longer available, since they were assigned (what to take out of the analysis). Q: What does the gen function mean? It says what expressions become available. Q: Why don't we have to define the analysis for while loops and if statements? it's implicit in the DFG Q: What are we assuming with this formalism? That the program is label consistent. Why? because of the use of blocks(S*) That the program has isolated entries. Why? because of the use of l == init(S*) *** observations This is a forward analysis, Why? because we don't use flow^R What makes a "solution" unsafe? too much in the set What makes it imprecise? too little in the set We want the largest (safe) sets, Why? because we want more information to use in optimizations Note the use of the word "must", what impact does that have on the analysis? intersections of entry information, makes us want the largest set *** example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) What's the data flow graph? ------------------------------------------ ------------------------------------------ KILL AND GEN What are killAE and genAE for this? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ ------------------------------------------ EXAMPLE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ Q: So what sets do we start with to find a solution? AExp* Q: So what would the solution be? ** Reaching Definitions Analysis (2.1.2) See the book Q: Is this a forward or backward analysis? forward, uses flow not flow^R Q: What makes the analysis imprecise? larger sets Q: So what solution do we want? the smallest set Q: Note the use of the word "may" in the analysis statement, what impact does that have on the analysis? unions of entry information, makes us want the smallest set ** Very Busy Expressions Analysis (2.1.3) *** idea and goals ------------------------------------------ VERY BUSY EXPRESSIONS def: An expression e is *very busy* at exit from block l if, e must always be used before some x \in FV(e) is assigned. At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 641]^14 ------------------------------------------ ... exits of 1, 3, 7, 9 (not 12 or any of the others) "an expression is very busy at exit from a block if it is very busy at the entry to every block that follows." "However, no expressions are very busy at the exit from any final block" ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS "For each program point, which expressions must be very busy at the exit from the point." ------------------------------------------ Q: What could we use this for? Hoisting, which is to store the value of the expression for later use. *** formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else \bigcap { VBentry(l') | (l', l) \in flow^R(S*) } VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = AExp(b) ------------------------------------------ Q: Is this a forward or backward analysis? backwards, because we use flow^R. Q: Does this analysis need isolated exits? the authors say so, why? Q: Why is there an intersection for VBexit? Q: Do we want the largest or the smallest solution? largest, because smaller is imprecise *** example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4; l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ ** Live Variables Analysis (2.1.4) *** idea and goals ------------------------------------------ LIVE VARIABLES def: A variable x is *live* at exit from label l if there is a path from l to a use of x that does not redefine x. Which variables are live at exit from 1? [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 [x := 3]^1; [y := x+2]^2; [y := y+1]^3 [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ ------------------------------------------ LIVE VARIABLES ANALYSIS "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ The authors take the view "and that no variables are live at the end of the program" however they remark that "for some applications it might be better to assume that all variables are live at the end of the program." Q: What can we use this for? dead code elimination: if the variable is not live at exit from a block that assigns to it, the assignment can be eliminated. *** definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) = (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = {x} killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = FV(a) genLV([skip]^l) = {} genLV([b]^l) = FV(b) ------------------------------------------ Q: Is this a forward or backward analysis? backwards, because we use flow^R. Q: Does this analysis need isolated exits? the authors say so, why? Q: Why is there a union for LVexit? Q: Do we want the largest or the smallest solution? smallest, because larger is imprecise We want to know exactly what the dead variables are. ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ ** Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var to its last assignment Definition-use (du) chain: links last assignment of var to its use ------------------------------------------ Q: What might this be useful for? dead code elimination code motion (reordering) *** formal definitions ------------------------------------------ DEFINITIONS AND USES definition clear path for x clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & ln = l' & n > 0 & (\forall i : 1 <= i < n : (li, li+1) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ Q: Why are the def and use functions correct? Q: How do you interpret the notion of a clear path? ------------------------------------------ UD and DU ud: Var* x Lab* -> Powerset(Lab*) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab* -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ Q: Do these require isolated entries? yes Q: Are these must or may analsyes? may Q: Can we define du in terms of ud? *** example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ===================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ *** computation Q: How could we use RD and LV to compute ud chains? see p. 54