COP 5021 meeting -*- Outline -*- * Dataflow Analysis (1.3) ** goals To propogate information about data through a program, in order to enable optimizations and other program transformations. ** idea ------------------------------------------ IDEA OF DATA FLOW ANALYSIS What's the basic idea? What is a data flow graph? How is that used to model the semantics? ------------------------------------------ ... Represent program as a data flow graph ... nodes are elementary blocks edges describe how control passes from one elementary block to another (like an assembly language program) ... nodes transform information (differences from entry to exit) edges connect exit from one node to entry of another (equations) ** example ------------------------------------------ EXAMPLE [y := 0]^1; [print y]^2; [read x]^3; while [x < 0]^4 do ([y := y+1]^5; [print y]^6; [read x]^7); [z := x]^8 What's the flow graph for this? ------------------------------------------ ... | v [y := 0]^1 | v [print y]^2 | v [read x]^3 | v /->[x < 0]^4 ----no-------->| | | | | | yes | | v | | [y := y+1]^5 | | | | | v | | [print y]^6 | | | | | v | | [read x]^7 | | | | \_____/ | ____________________/ / v [z := x]^8 | v Q: How would you handle Read? Print? they are elementary blocks Q: How would you handle if then else statements? with a split of the control flow and a join after Q: For loops? like while loops Q: How would you handle break? a flow from the break to after the loop Q: How would you handle try-catch and throw? Throw hooks up to the surrounding catch body, or out Q: How would you handle assert? Assume? a split from the statement to the next statement or an exit Q: Choose? a split Q: Parallel composition? a split again *** the equational approach (1.3.1) In the in-class examples, we'll work with a taint analysis in the WHILE langauge extended with read/sanitize/print. ------------------------------------------ NODE AND EDGE EQUATIONS FOR TAINT ANALYSIS Taint analysis: find the set of variables each program point that may have a value derived from a value previously read from the user ("tainted") Tentry, Texit : Lab* -> Powerset(Var*) where Lab* = set of labels in program Var* = set of variables in prog block Equation ======================================= [x:=a]^l Texit(l) = [skip]^l Texit(l) = [b]^l Texit(l) = [read x]^l Texit(l) = [sanitize x]^l Texit(l) = [print x]^l Texit(l) = How are edges connected? ------------------------------------------ ... (Tentry(l) - {x}) \cup (FV(a) \cap Tentry(l)) Tentry(l) Tentry(l) Tentry(l) \cup {x} Tentry(l) - {x} Tentry(l) Q: Why are these the right equations? Q: How does this work out for our example? ... the Texit of the tail of an edge is unioned into the Tentry of the head **** algorithm for solving the equations Q: What can we do to solve a set of simultaneous equations? e.g., R(1) = {y} R(2) = R(1) \cup {x} want least solution or greatest, which is better for this case? least, as Var* may have be tainted, so that's safe but not precise what ordering is appropriate for sets? ------------------------------------------ LEAST SOLUTION F: (Powerset(Var*))^16 -> (Powerset(Var*))^16 F is defined by: F(T_1, ..., T_{16}) = (F_1(T_1, ..., T_{16}), F_2(T_1, ..., T_{16}), ..., F_{16}(T_1, ..., T_{16})) where F_1(T_1, ..., T_{16}) // Tentry(1) = {} F_2(T_1, ..., T_{16}) // Texit(1) = (T_1 - {y}) \cup {} F_3(T_1, ..., T_{16}) // Tentry(2) = T_2 F_4(T_1, ..., T_{16}) // Texit(2) = T_3 F_5(T_1, ..., T_{16}) // Tentry(3) = T_4 F_6(T_1, ..., T_{16}) // Texit(3) = T_5 \cup {x} F_7(T_1, ..., T_{16}) // Tentry(4) = F_8(T_1, ...., T_{16}) // Texit(4) = Solution, is a 16-tuple of the form (Tentry(1), Texit(1), ..., Tentry(8), Texit(8)) Such a 16-tuple is a solution if: (Tentry(1), Texit(1), ..., Tentry(8), Texit(8)) = F(Tentry(1), Texit(1), ..., Tentry(8), Texit(8)) ------------------------------------------ ... T_6 \cup T_{14} ... T_7 Q: Would this still be a 16-tuple if there were seven elementary blocks? Q: How to compare such tuples? pointwise, i.e., T \sqsubset T' iff \forall i, T_i \subset T'_i. Q: What does the lattice structure of (Powerset(Var*))^16 look like? the set lattice Q: How can one find the fixed point using this information? iterates starting from the 16-tuple of empty sets (bottom in the lattice) Q: Why is the fixed point a solution? because it satisfies all the equations Q: Why do we want the least fixed point? because a larger one would be less precise, and we want the most precise solution we can obtain *** the constraint based approach Q: What are the constraints? ------------------------------------------ WHAT CONSTRAINTS FOR THIS EXAMPLE? [y := 0]^1; [print y]^2; [read x]^3; while [x < 0]^4 do ([y := y+1]^5; [print y]^6; [read x]^7); [z := x]^8 What's the flow graph for this? ------------------------------------------ for this program: {} \subseteq Tentry(1) Tentry(1) - {y} \subseteq Texit(1) Texit(1) \subseteq Tentry(2) Tentry(2) \subseteq Texit(2) Texit(2) \subseteq Tentry(3) Tentry(3) \cup {x} \subseteq Texit(3) Texit(3) \cup Texit(7) \subseteq Tentry(4) Tentry(4) \subseteq Texit(4) Texit(4) \subseteq Tentry(5) Tentry(5) \cup (Tentry(5) \cap {y}) \subseteq Texit(5) Texit(5) \subseteq Tentry(6) Tentry(6) \subseteq Texit(6) Texit(6) \subseteq Tentry(7) Tentry(7) \cup {x} \subseteq Texit(7) Q: How do the flows work? unioning when they join Q: What's the connection to the equations? It's one side of the equations Q: In what direction do the subset constraints go? in the same way as the program's flows (smaller to larger)