Com S 641 meeting -*- Outline -*- * Abstract Interpretation (1.5) ** goals To specify an analysis in a way that is correct by construction. To calculate an analysis in a systematic way. ** idea Q: What's the basic idea? 1. Write the semantics that collects (more than) the information required. 2. Define an abstraction function (a) that extracts the information needed to answer the question of the analysis 3. Calculate the analysis Q: What's a collecting semantics? A mathematical semantics of the program that records all the necessary information for the analysis. Q: How is that used to extract the analysis? via a Galois connection... ** example ------------------------------------------ EXAMPLE [q := 0]^1; [r := x]^2; while [r >= y]^3 do ([r := r-y]^4; [q := q+1]^5); assert [0<=r and r (Powerset(Trace))^{12} G is defined by: G(CS_1, ..., CS_{12}) = (G_1(CS_1, ..., CS_{12}), G_2(CS_1, ..., CS_{12}), ..., G_{12}(CS_1, ..., CS_{12})) where G_1(CS_1, ..., CS_{12}) = {((q,?),(r,?),(x,?),(y,?))} G_2(CS_1, ..., CS_{12}) = {tr : (q,1) | tr \in CS_1} G_3(CS_1, ..., CS_{12}) = CS_2 ... Solution (CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) is a solution if G(CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) = (CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) ------------------------------------------ Q: What is G_1? the constant {((q,?),(r,?),(x,?),(y,?))} Q: Why does it have 12 parameters? Because all of the G_i do, it doesn't use any of them. Q: What is G_2? CSexit(1) Q: Why is G_3 = CS_2 in this example? because the semantics says that CSentry(2) = CSexit(1) in this example so we have G(\vec{CS}) = \vec{CS}, or we can represent the system of equations as \vec{CS} = G(\vec{CS}). Q: What's the ordering on the solution space? pointwise for the vector, subsets for sets of traces Q: What does it mean for G to be monotone? that means if it's given a bigger tuple of set of traces, then it returns one that is no smaller. So by the theory one can find a least solution, the least fixed point of G, lfp(G). *** Galois connections ------------------------------------------ ABSTRACTION AND CONCRETIZATION abstraction function for AV analysis: a: Powerset(Trace) -> Powerset(Var) a(X) = { x | (x,l) \in tr, l != ?, tr \in X } concretization function for AV analysis: g: Powerset(Var) -> Powerset(Trace) g(Y) = { tr | (\forall x \in Y :: (\exists l \in Lab* :: l != ?, (x,l) \in tr))} Adjunction, or Galois connection: a(X) \subseteq Y <==> X \subseteq g(Y) ------------------------------------------ Note: a and g are \alpha and \gamma in the book Q: Would a and g be different for the RD analysis? yes draw a picture like figure 1.3 ------------------------------------------ set of traces set of vars |---------------| |---------------| | | g | | | g(Y) <---------------- Y | | U| | | U| | | X -----------------> a(X) | | | a | | |_______________| |_______________| ------------------------------------------ *** calculating the analysis Q: Why do we care about the abstraction and concretization functions? ------------------------------------------ CALCULATING THE ANALYSIS Extend a and g pointwise to tuples: a(X_1, ..., X_12) = (a(X_1), ..., a(X_12)) g(Y_1, ..., Y_12) = (g(Y_1), ..., g(Y_12)) Define the AV analysis by the function a o G o g: Powerset(Var)^12 -> Powerset(Var)^12 so for each i in {1..12} (a o G_i o g): Powerset(Var)^12 -> Powerset(Var) by a(G_1(g(AV_1, ..., AV_12))) = a({(q,?),(r,?),(x,?),(y,?)}) = {} a(G_2(g(AV_1, ..., AV_12))) = a({tr : (q,1) | tr \in g(AV_1)}) So a solution (AVentry(1), ..., AVexit(6)) has the property that (AVentry(1), ..., AVexit(6)) = (a o G o g)(AVentry(1), ..., AVexit(6)) ------------------------------------------ I'm abusing notation above, but the book uses vector notation for the extensions Q: What's AV_3? Q: How does this compare to the analysis we created by hand? Q: What's the benefit of doing things this way? - we know there's a least solution, lfp(a o G o g) - we get correctness of the analysis, from correctness of the collecting semantics a(lfp(G)) is safely approximated by lfp(a o G o g) when these are equal, we said the analysis is optimal.