Com S 641 Lecture -*- Outline -*- * Shape Analysis (2.6) Back to ignoring procedures! But now we look at heap-allocated data Goals: finite approximations of data structures (shapes) useful for other software tools (e.g., null dereference detection) can be used for some kinds of verification of invariants (e.g., procedure doesn't introduce cycles into a list) ** syntax For reference, which we've seen and don't need to do again: a \in AExp "arithmetic expressions" b \in BExp "Boolean expressions" S \in Stmt "statements" x,y \in Var "variables" n \in Num "numeric literals" l \in Lab "labels" opa \in Op_a "arithmetic operators" opb \in Op_b "Boolean operators" opr \in Op_r "relational operators" New and modified... ------------------------------------------ SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dreference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assume [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Don't allow pointer arithmetic, although it looks like it Q: Can we test for equality of pointers? Yes ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f = t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ ** structural operational semantics (2.6.1) *** domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" s \in State = Var* -> Storable Storable = Z + Loc + {<>} h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ we assume that Loc is infinite <> is the nil value ->fin is for finite functions, and hence partial Var* is a finite set of variables (occurring in the program) Q: How would you explain the model of the heap? describes the fields of an object at a given location *** denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ PExp* is subset of PExp such that (\forall p \in PExp* :: FV(p) \subseteq Var*) Q: Do we have to modify the semantics of arithmetic and Boolean expressions now? Yes, since pointers can be involved in the syntax, and since their semantics is partial and needs the heap... ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = (OP_a[[op_a]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) ------------------------------------------ Q: What happens if P[[p]](s, h) is undefined? Then A[[p]](s, h) is undefined Q: Are there any other cases where A is undefined? Yes... Q: How do we prevent pointer arithmetic? making the meaning of the operators be undefined unless both arguments are integers (i.e., by changing OP_a[[.]]) ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = if v = <> then tt else ff ------------------------------------------ Q: What kinds of changes would be needed to OP_r, if any? Want to allow equality comparison of pointers, but not mixed comparisons between pointers and integers; the mixed cases should be undefined. *** operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ Q: How does that change of configurations affect the description of the transitions? have to add the heap everywhere... ------------------------------------------ TRANSITIONS [ass] ([x := a]^l, s, h) --> (s[x |-> A[[a]](s,h)], h) if A[[a]](s,h) is defined [fass] ([x.sel := a]^l, s, h) --> (s, h[(s(x),sel) |-> A[[a]](s,h)]) if s(x) \in Loc and A[[a]](s,h) is defined [mal] ([malloc x]^l, s, h) --> (s[x |-> xi], h) if xi \in Loc does not occur in s or h [fmal] ([malloc x.sel]^l, s, h) --> (s, h[(s(x), sel) |-> xi], h) if xi \in Loc does not occur in s or h and s(x) \in Loc ------------------------------------------ Q: What happens if the side conditions are not met in [ass] or [fass]? the semantics is "stuck", because no transitions can take place, and since the configuration is not terminal, this indicates an error Q: Can that happen in [fmal]? yes Q: In [mal]? no, Loc is infinite xi is "fresh" in [mal] and [fmal] Q: What would the skip rule look like? seq1? [skip] ([skip]^l, s, h) --> (s, h) (S1, s, h) --> (S1', s', h') [seq1] ---------------------------------- (S1;S2, s, h) --> (S1';S2, s', h') Don't need to show the following... (S1, s, h) --> (s', h') [seq2] ---------------------------- (S1;S2, s) --> (S2, s', h') Q: How does the partiality of the Boolean semantics affect the if rules? [if1] (if [b]^l then S1 else S2, s, h) --> (S1, s, h) if B[[b]](s, h) = true [if2] (if [b]^l then S1 else S2, s, h) --> (S2, s, h) if B[[b]](s, h) = false So if B[[b]](s, h) is undefined, the semantics is "stuck", which indicates an error. The same would happen with the "while" rules [wh1] (while [b]^l do S, s, h) --> (S; while [b]^l do S, s, h) if B[[b]](s,h) = true [wh2] (while [b]^l do S, s, h) --> s if B[[b]](s,h) = false ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ ** a shape analysis the choices made in this analysis should be thought of as an example, one could make other choices and trade precision for efficiency... *** property space (2.6.2) the property space for the analysis is sets of compatible shape graphs ------------------------------------------ SHAPE GRAPHS (2.6.2) Goals: - faithfully model real state and heap - smaller (finite) representation def: (S,H,is) is a *shape graph* for S* iff: S \in AState = Powerset(Var* x ALoc) H \in AHeap = Powerset(ALoc x Sel x ALoc) is \in IsShared = Powerset(ALoc) where ALoc = {n_Z | Z \subseteq Var*}. ------------------------------------------ AState is a set of abstract states AHeap is a set of abstract heaps IsShared contains sharing information ALoc is a set of abstract locations **** abstract locations ------------------------------------------ ABSTRACT LOCATIONS n_{x,y} represents the location that x and y both point to in a given state (s(x) and s(y)) n_{} represents locations not pointed to by variables ------------------------------------------ n_{} is the "abstract summary location" ------------------------------------------ EXAMPLE What are the abstract locations that exist at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ **** abstract states ------------------------------------------ ABSTRACT STATES S \in AState = Powerset(Var* x ALoc) E.g., {(x, n_{x,z}), (z, n_{x,z}), (y, n_{y})} represents a state s such that s(x) = xi1 s(z) = xi1 s(y) = xi2 ALoc(S) = {n_X | (x, n_X) \in S} EXAMPLE What is the abstract state at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ all Draw a picture of this example state and the abstract state. Q: What about variables that map to integers? They are ignored, as are variables that map to nil. So if we add [z := 641]^5 to the program, it doesn't change the abstract state **** abstract heaps ------------------------------------------ ABSTRACT HEAPS H \in AHeap = Powerset(ALoc x Sel x ALoc) E.g., {(n_{x,z}, next, n_{y}), (n_{y}, next, n_{}), (n_{}, next, n_{})} represents a heap h and state s such that: h(s(x), next) = s(y) h(s(z), next) = s(y) h(s(y), next) = xi h(xi, next) = xi or xi2 ALoc(H) = {n_V | (n_V, sel, n_W) \in H} \cup {n_W | (n_V, sel, n_W) \in H} EXAMPLE What is the abstract state and heap at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; ------------------------------------------ Draw a picture for the example. Q: What do we know about the abstract summary location, in general? not much, could represent lots of locations lumped together **** sharing information ------------------------------------------ SHARING INFORMATION is \in IsShared = Powerset(ALoc) {n_{x,z}} means that the heap has more than one pointer to s(x) or s(y) EXAMPLE What is the sharing information at the exit of each label in: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; [z := y.next]^5; [z.next := x]^6; [z.back := y]^7; ------------------------------------------ Draw a picture of the state, heap, and the abstract state and heap for the example **** invariants ------------------------------------------ INVARIANTS ON SHAPE GRAPHS def: A shape graph (S,H,is) is *compatible* iff 1. (\forall n_V, n_W \in ALoc(S) \cup ALoc(H) \cup is :: V = W or (V \cap W = {})) 2. (\forall (x, n_X) \in S :: x \in X) 3. (\forall (n_V, sel, n_W) in H :: (\forall (n_V, sel, n_W') in H :: V = {} or W = W')) 4. (\forall n_X \in is :: (\exists sel :: (n_{}, sel, n_X) \in H) or (\exists (n_V, sel1, n_X) \in H :: (\exists (n_W, sel2, n_X) \in H :: sel1 != sel2 or V != W))) 5. (\forall (n_V, sel1, n_X) \in H :: (\forall (n_W, sel2, n_X) \in H :: ((sel1 != sel2 or V != W) and X != {}) ==> n_X \in is)) ------------------------------------------ Q: Why are these sensible? 1. enforces the interpretation of abstract locations, without it, a location could participate in multiple different abstract locations 2. also enforces the interpretation of abstract locations, without it, some locations pointed to by variables might not correspond to any abstract location in the abstract state 3. enforces the interpretation of the abstract heap, at least for locations pointed to directly by variables, without it, some heap edges might not be represented in the abstract heap 4 and 5. enforce the interpretation of the sharing information **** actual property space ------------------------------------------ PROPERTY SPACE IS SETS OF COMPATIBLE SHAPE GRAPHS Assume a fixed/given program S*. Compatible shape graphs: SG = {(S,H,is) | (S,H,is) is compatible} Property Space L = Powerset(SG) \bigsqcup = \bigcup So \sqsubseteq = \subseteq ------------------------------------------ Q: Is this property space finite? Yes, because Var* is finite, the other parts are. *** instance of a monotone framework (2.6.3) ------------------------------------------ MONOTONE FRAMEWORK INSTANCE Assume a fixed/given program S*. Montone framework instance: L = Powerset(SG) Funs = {f : L -> L | f is monotone} F = flow(S*) E = {init(S*)} i = some set {(S_0, H_0, is_0)} f^SA_. described below ------------------------------------------ Q: Is this a forward analysis? yes Q: Is this a may or a must analysis? may, but because of the invariants on shape graphs, individual shape graphs cannot contain superfluous information can be used to achieve *strong update* and *strong nullification* i.e., can remove old analysis information before adding new information **** equations ------------------------------------------ RESULTING EQUATIONS Shape_o(l) = if l = init(S*) then i else \bigcup {Shape_.(l') | (l',l) \in flow(S*)} Shape_.(l) = f^SA_l(Shape_o(l)) ------------------------------------------ Q: Which of these is the entry information? Shape_o(l) **** transfer functions ------------------------------------------ TRANSFER FUNCTIONS General form: f^SA_l(SG) = \bigcup {phi^SA_l((S,H,is)) | (S,H,is) \in SG} ------------------------------------------ Q: What is this doing? it maps the function phi^SA_l over all shape graphs in the argument set, SG, and complains the results. ------------------------------------------ TESTS AND SKIP If l labels [b]^l, then phi^SA_l((S,H,is)) = {(S,H,is)} If l labels [skip]^l, then phi^SA_l((S,H,is)) = {(S,H,is)} ------------------------------------------ ------------------------------------------ MANIPULATIONS USEFUL FOR ASSIGNMENTS Taking a variable x out of an ALoc: k_x(n_Z) = n_Y, where Y = Z\{x} Taking variable x out of a shape graph: kill_x((S,H,is)) = (S',H',is') where S' = {(z, k_x(n_Z)) | (z, n_Z) \in S, z != x} H' = {(k_x(n_V), sel, k_x(n_W)) | (n_V, sel, n_W) \in H} is' = {k_x(n_X) | n_X \in is} Taking a field x.sel out of a shape graph: kill_x.sel((S,H,is)) = (S',H',is') where S' = S H' = {(n_V, sel', n_W)) | (n_V, sel', n_W) \in H, n_X \in ALoc(S), (X != V or sel' != sel)} is' = restore(is, H') restore(is, H') = { n_V | n_V \in is, #into(n_V, H') > 1 or (\exists sel':: (n_{}, sel', n_V) \in H') } #into(n_V, H) = number of pointers to n_V in H Adding a binding from x to y's binding: g^y_x(n_Z) = if y \in Z then n_W where W = Z \cup {x} else n_Z Adding a binding from x to y's binding in a Shape graph: glom^y_x((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x((S,H,is)) S'' = {(z,g^y_x(n_Z)) | (z, n_Z) \in S'} \cup {(x,g^y_x(n_Y)) | (y, n_Y) \in S'} H'' = {(g^y_x(n_V), sel, g^y_x(n_W)) | (n_V, sel, n_W) \in H'} is'' = {g^y_x(n_Z) | n_Z \in is'} Adding a binding from x to n_U: h^U_x(n_Z) = if Z = U then n_Y where Y = U \cup {x} else n_Z Adding a binding from x to n_U in a Shape graph: hbind^U_x((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x((S,H,is)) S'' = {(z,h^U_x(n_Z)) | (z, n_Z) \in S'} \cup {(x,h^U_x(n_U))} H'' = {(h^U_x(n_V), sel, h^U_x(n_W)) | (n_V, sel, n_W) \in H'} is'' = {h^U_x(n_Z) | n_Z \in is'} Adding a binding from x.sel to n_Y in a Shape graph: hsbind(x,n_X,n_Y)((S,H,is)) = {(S'',H'',is'')} where (S',H',is') = kill_x.sel((S,H,is)) S'' = S' H'' = H' \cup {(n_X, sel, n_Y)} is'' = is \cup {n_Y | #into(n_Y,H') >= 1} ------------------------------------------ Note that kill_x, glom^y_x, and hbind^U_x are essentially mapping k_x and g^y_x (respectively) over their shape graph argument. Q: What is the type of each of these? Q: What is kill_y((S,H,is)), where S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})} H = {(n_{y,z}, next, n_{}), (n_{x}, next, n_{y,z}), (n_{}, next, n_{y,z}), (n_{}, next, n_{})} is = {n_{y,z}} ? Q: What is glom^y_x((S,H,is)), where S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})} H = {(n_{y,z}, next, n_{}), (n_{x}, next, n_{y,z}), (n_{}, next, n_{y,z}), (n_{}, next, n_{})} is = {n_{y,z}} ? ------------------------------------------ ASSIGNMENTS If l labels [x := a]^l, then if a is of form n, a_1 op_a a_2, or nil, then phi^SA_l((S,H,is)) = {kill_x((S,H,is))} elseif a is of the form x (i.e., x := x), then phi^SA_l((S,H,is)) = {(S,H,is)} elseif a is of the form y, where y != x, then phi^SA_l((S,H,is)) = glom^y_x((S,H,is)) elseif a is of the form x.sel (i.e., x := x.sel), then phi^SA_l((S,H,is)) = (see text/homework) else (a is of the form y.sel, where y != x) let (S',H',is') = kill_x((S,H,is)) in if there is no n_Y such that (y,n_Y) \in S' then phi^SA_l((S,H,is)) = {kill_x((S,H,is))} elseif there is some n_Y such that (y,n_Y) \in S' and there is a n_U != n_{} such that (n_Y, sel, n_U) \in H' then phi^SA_l((S,H,is)) = hbind^U_x((S,H,is)) else (there is some n_Y such that (y,n_Y) \in S' and (n_Y, sel, n_{}) \in H') phi^SA_l((S,H,is)) = {(S'',H'',is'') | (S'',H'',is'') is compatible, kill_x((S'',H'',is'') = kill_x(S,H,is), (x,n_{x}) \in S'', (n_Y, sel, n_{x}) \in H''} end ------------------------------------------ Q: When can all these cases occur? Discuss example 2.49, 2.52, see the figures. ------------------------------------------ ASSIGNMENTS TO FIELDS If l labels [x.sel := a]^l, then if a is of form n, a_1 op_a a_2, or nil, then if there is no n_X such that (x,n_X) \in S then phi^SA_l((S,H,is)) = {(S,H,is)} elseif there is some n_X such that (x,n_X) \in S and there is no n_U such that (n_X, sel, n_U) \in H then phi^SA_l((S,H,is)) = {(S,H,is)} else (there is some n_X such that (x,n_X) \in S and there is some n_U such that (n_X, sel, n_U) \in H) phi^SA_l((S,H,is)) = {kill_x.sel(S,H,is)} elseif a is of the form x (i.e., x.sel := x), then phi^SA_l((S,H,is)) =(see text/homework) elseif a is of the form y, where y != x, then if there is no n_X such that (x, n_X) \in S then phi^SA_l((S,H,is)) = {(S,H,is)} elseif there is some n_X such that (x, n_X) \in S and there is no n_Y such that (y, n_Y) \in S then phi^SA_l((S,H,is)) = {kill_x.sel(S,H,is)} else (there is some n_X such that (x, n_X) \in S and there is some n_Y such that (y, n_Y) \in S) phi^SA_l((S,H,is)) = hsbind(x,n_X,n_Y)((S,H,is)) else (a is of the form y.sel) phi^SA_l((S,H,is)) = (see text/homework) ------------------------------------------ Q: When can all these cases occur? Discuss example 2.54, figures 2.21 and 2.22 ------------------------------------------ MALLOC If l labels [malloc x]^l, then phi^SA_l((S,H,is)) = P(S \cup {(x, n_{x})}, H', is') where (S',H',is') = kill_x(S,H,is) Elseif l labels [malloc x.sel]^l, then phi^SA_l((S,H,is)) = (see text/homework) ------------------------------------------ ... and we collapse in a heap :-)