I. Interprocedural Analysis (2.5) A. syntax ------------------------------------------ SYNTAX Procedures with 1 call-by-value parameter, and 1 call-by-result parameter. P \in Program D \in Declaration P ::= begin D; S end D ::= proc p(val x, res y) is^ln S end^lx | D; D S ::= ... | [call p(a,z)]^lc_lr Example: begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end ------------------------------------------ In call p(a,z), what syntactic category is a? z? What assumptions would simplify the analysis? What else do we need to do analysis and correctness proofs? B. operational semantics (2.5.1) How is a procedure different than a macro? ------------------------------------------ OPERATIONAL SEMANTICS xi in Loc locations rho in Env = Var* -> Loc environments s in Store = Loc ->_fin Z stores Assume s o rho is total: ran(rho) \subseteq dom(s) ------------------------------------------ How do these states relate to the states we had previously? How should we deal with global variables? ------------------------------------------ OPERATIONAL SEMANTICS [skip] rho |-* ([skip]^l, s) --> s [asgn] rho |-* ([x:=a]^l, s) --> s[rho(x) |-> A[[A]](s o rho)] if s o rho is total ------------------------------------------ What would the sequence rules look like? while? if? What would the calls look like in the operational semantics? ------------------------------------------ CALL AND BIND RULES [call] rho |-* ([call p(a,z)]^lc_lr, s) --> (bind rho'' in S then z := y, s'') if xi1, xi2 not in dom(s), v in Z, proc p(val x, res y) is^ln S end^lx is in D*, rho'' = rho*[x |-> xi1, y |-> xi2] s'' = s[xi1 |-> A[[a]](s o r), xi2 |-> v] rho' |-* (S, s) --> (S', s') [bind1]________________________________ rho |-* (bind rho' in S then z := y, s) --> (bind rho' in S' then z := y, s') rho' |-* (S, s) --> s' [bind2]________________________________ rho |-* (bind rho' in S then z := y, s) --> s'[rho(z) |-> s'(rho'(y))] ------------------------------------------ What is rho*? Where is bind-in-then in the surface syntax? Why add bind-in-then to the language To ensure static scoping and enforce the result parameter. Scoping happens because we use rho'' during evaluation of the body, and then when the call finishes, we return to rho Q: How do you parse the first rule? How do you parse the first rule? What is [bind2] doing? In [bind2], why is rho(z) used instead of rho'(z)? Do we have to deal with bind-in-then for proofs? ------------------------------------------ EXAMPLE Let P be proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 Name each statement S_i if it has label i (for calls, use the top label). Let S_2 be if [n == 0]^2 then ... Let S_79 be [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 Let rho* = {v |-> 0, w |-> 1} s00 = {0 |-> 0, 1 |-> 0} (and for later use) Let rho1 = rho*[n |-> 2, v |-> 3] = {v |-> 3, w |-> 1, v |-> 2} s0039 = {0 |-> 0, 1 |-> 0, 2 |-> 3, 3 |-> 9} Calcuate in the context of rho*: rho* |-* (S_79, s00) --> * ([call fact(3,v)]^7_8, s00) --> (bind rho*[n |-> 2, v |-> 3] in S_2 then v := v, s00[2 |-> 3, 3 |-> 9]) = (bind rho1 in S_2 then v := v, s0039) . (bind rho1 in S_2 then v := v; S_9, s0039) --> * (bind rho1 in (if [n == 0]^2 then ...) then v := v, s0039) --> rho1 |-* (if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11), s0039) --> ([call fact(n-1, v)]^4_5; [v:=v*n]^11, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v; S_9, s0039) ------------------------------------------ C. flow graphs (non-modular) How should we make flow graphs for calls? ------------------------------------------ FLOW GRAPHS FOR CALLS init([call p(a, z)]^lc_lr) = final([call p(a, z)]^lc_lr) = blocks([call p(a, z)]^lc_lr) = labels([call p(a, z)]^lc_lr) = flow([call p(a, z)]^lc_lr) = {(lc;ln), (lx;lr)} if proc p(val x, res y) is^ln S end^lx is in D* ------------------------------------------ What should these be? Why use semicolons for the flows? What would happen if p was a program variable? ------------------------------------------ FLOW GRAPHS FOR PROCEDURES For each procedure declaration proc p(val x, res y) is^ln S end^lx init(p) = final(p) = blocks(p) = {is^ln, end^lx} \cup blocks(S) labels(p) = flow(p) = ------------------------------------------ What should these be? ------------------------------------------ FLOW GRAPHS FOR PROGRAMS For program P* = begin D* S* end init* = init(S*) final* = final(S*) blocks* = blocks(S*) \cup \bigcup {blocks(p) | proc p... in D*} labels* = labels(S*) \cup \bigcup {labels(p) | proc p... in D*} flow* = flow(S*) \cup \bigcup {flow(p) | proc p... in D*} ------------------------------------------ Is the flow graph for a program still finite? What is Lab* for such a program? 1. precision problems ------------------------------------------ PROBLEMS WITH PRECISION Suppose we treat flows (lc;ln) and (lx;lr) the same as all other flows (l1, l2). In terms of a monotone framework get: A_.(l) = f_l(A_o(l)) A_o(l) = \bigsqcup { A_.(l') | (l',l) in F or (l';l) in F} \sqcup i^l_E where i^l_E = i if l \in E and i^l_E = \bot if l \not\in E ------------------------------------------ Does the information from a call along (lc;ln) necessarily flow back to the associated label via (lx;lr)? ------------------------------------------ A SOLUTION IDEA: INTERPROCEDURAL FLOWS filter flows to avoid returns from non-calls inter-flow* = { (lc, ln, lx, lr) | P* contains [call p(a,z)]^lc_lr and proc p(val x, res y) is^ln S end^lx } Notation: IF is an abstraction of inter-flow* for forward analysis: IF = inter-flow* for backward analysis: IF = inter-flow^R* ------------------------------------------ How could we use IF (i.e., inter-flow* or it's reverse)? ------------------------------------------ EXAMPLE begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end What is flow*? What is inter-flow*? ------------------------------------------ What else do we need to do analysis? What is the size of the flow graph in terms of the number of calls? What is the size of the flow graph for a recursive procedure? Is this the right way to look at dataflow problems? D. a modular approach, procedure summaries What do we do for type checking of procedures? ------------------------------------------ PLAN FOR PROCEDURE SUMMARIES 0. Consider call to be a kind of elementary block. 1. Compute the analysis information for each procedure p, based on its body 2. Summarize each procedure p with a transfer function summary(p): 3. Call statements have kill and gen (i.e., transfer) functions that uses summary(p) and handles argument passing 4. Compute a fixed-point to solve for summaries Iteration: use a "bottom" summary to start: summary_0 = \p.\(i,v). \bot Then iterate construction of summary until reaches a fixed-point ------------------------------------------ What kind of dependency can summary(p) have on the Int argument? 1. example: reaching definitions ------------------------------------------ EXAMPLE: REACHING DEFINITIONS L = P(Var* x Lab*^?) For analysis within a procedure, proc p(val n, res v) is^ln S end^lx formal n is considered initialized at label ln summary(p) = \(i,v). \bigcup {RD_exit(l)|l \in finals(S)} kill_RD([call p(a,z)]^lc_lr) = {(z,l)|l \in \Lab*^?} gen_RD([call p(a,z)]^lc_lr) = summary(p)(a,z) \cup {(z,lr)} ------------------------------------------ II. Shape Analysis (2.6) A. syntax ------------------------------------------ NEW AND MODIFIED SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dereference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assert [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Could we rule out pointer arithmetic in a type system? Can we test for equality of pointers? ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f == t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ B. structural operational semantics (2.6.1) 1. domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" s \in State = Var* -> Storable Storable = Z + Loc + {<>} h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ How would you explain the model of the heap? 2. denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ Do we have to modify the semantics of arithmetic and Boolean expressions now? ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = (OP_a[[op_a]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) ------------------------------------------ What happens if P[[p]](s, h) is undefined? Are there any other cases where A is undefined? How do we prevent pointer arithmetic? ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = if v = <> then tt else ff ------------------------------------------ What kinds of changes would be needed to OP_r, if any? 3. operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ How does that change of configurations affect the description of the transitions? ------------------------------------------ TRANSITIONS [asgn] ([x := a]^l, s, h) --> (s[x |-> A[[a]](s,h)], h) if A[[a]](s,h) is defined [fasg] ([x.sel := a]^l, s, h) --> (s, h[(s(x),sel) |-> A[[a]](s,h)]) if s(x) \in Loc and A[[a]](s,h) is defined [mal] ([malloc x]^l, s, h) --> (s[x |-> xi], h) if xi \in Loc does not occur in s or h [fmal] ([malloc x.sel]^l, s, h) --> (s, h[(s(x), sel) |-> xi], h) if xi \in Loc does not occur in s or h and s(x) \in Loc ------------------------------------------ Does x.sel need to already be allocated when an fasg is executed? What happens if the side conditions are not met in [asgn] or [fasg]? Can that happen in [fmal]? In [mal]? What would the skip rule look like? seq1? How does the partiality of the Boolean semantics affect the if rules? ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We write L0 and L1 for locations. Let S_i be the statement with label i Let S_1234 be S_1; S_2; S_3; S_4, let S_234 be S_2; S_3; S_4, etc. Let s01 be the state where x has value L0 and y value L1 Let h0 = {} h1 = {(L0,next) |-> L1} h2 = {(L0,next) |-> L1, (L1,next) |-> L0} (S_1234, s??, h0) --> {by [seq2]} * ([malloc x]^1, s??, h0) --> {by [mal], using L0 as the new location} (s0?, h0) . (S_234, s0?, h0) --> {by [seq2]} * ([malloc x.next]^2, s0?, h0) --> {by [fmal], using L1 as the new location} (s0?, h1) . (S_34, s0?, h1) --> {by [seq2]} * ([y := x.next]^3, s0?, h1) --> {by [asgn], as A[[x.next]](s0?,h1) = L1} (s01, h1) . ([y.next := x]^4, s01, h1) --> {by [fasg], A[[x]](s01,h1) = L0} (s01, h2) ------------------------------------------ C. a shape analysis ------------------------------------------ SHAPE ANAYSIS WITH PREDICATE ABSTRACTION Based on (largely quoted from): Roman Manevich, E. Yahav, G. Ramalingam, and Mooly Sagiv. "Predicate abstraction and canonical abstraction for singly-linked lists." In Verification, Model Checking, and Abstract Interpretation, volume 3385 of Lecture Notes in Computer Science, pages 181--198, Berlin, 2005. Springer-Verlag. ------------------------------------------ 1. representing states with predicates (section 2.1) ------------------------------------------ REPRESENTING PROGRAM STATES AS A FIRST-ORDER LOGICAL STRUCTURE Objects represented by "individuals" State represented by a formula over a fixed set of predicates ------------------------------------------ ------------------------------------------ EXAMPLE FOR SINGLY-LINKED LISTS Objects (v) with a field "n" The set P^C = {eq, x, n} predicates meaning ======================================= eq(v1, v2) v1 equals v2 x(v) var x points to object v, for x \in Var* n(v1, v2) the next field of v1 points to v2 Consider the program: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We could encode the state and heap at the end of this code as: Individuals: U = {L0, L1} eq(L0, L0) = tt eq(L1, L1) = tt eq(L0, L1) = ff eq(L1, L0) = ff x(L0) = tt y(L1) = tt x(L1) = ff y(L0) = ff n(L0, L1) = tt n(L1, L0) = tt n(L0, L0) = ff n(L1, L1) = ff ------------------------------------------ How can we tell if a variable y points to <> or an integer? ------------------------------------------ 2-STRUCTS OVER P def: A *2-valued logical structure* over a set of predicates P, 2-STRUCT_P is a pair S = (U,i) where U is the universe (set of individuals) and i is the interpretation function, such that for all p in P of arity k: i(p): U^k -> {tt, ff} ------------------------------------------ What's the arity of eq? x? n? ------------------------------------------ ABSTRACTION FROM FORMER CONCRETE STATES toPC(s, h) = (U,i) U = {xi \in Loc | xi \in ran(s) or xi \in ran(h)} i(eq(xi1, xi2)) = (xi1 == xi2) i(x(xi)) = (s(x) == xi) i(n(xi1, xi2)) = (h(xi1,next) == xi2) ------------------------------------------ What would you take as a program's initial 2-STRUCT over P^C? Is i invertible? Could we write the language's semantics using 2-STRUCTS over P^C? ------------------------------------------ TRANSITIONS USING 2-STRUCTS OVER P^C [asgnil] ([x := nil]^l, (U, i)) --> (U, i[x(xi) |-> ff]) if xi \in U and i(x(xi)) [asgv] ([x := y]^l, (U, i)) --> (U, i[x(xi) |-> tt]) if xi \in U and i(y(xi)) [asgf] ([x := y.n]^l, (U, i)) --> (U, i[x(xi) |-> tt]) if xi, xi2 \in U, i(y(xi2)), and i(n(xi2, xi)) [fasgnil] ([x.n := nil]^l, (U, i)) --> (U, i[n(xi,xi2) |-> ff]) if xi, xi2 \in U, i(x(xi)), and i(n(xi, xi2)) [fasgv] ([x.n := y]^l, (U, i)) --> (U, i[n(xi,xi2) |-> tt]) if xi, xi2 \in U, i(x(xi)), and i(y(xi2)) and there is no xi3 such that i(n(xi, xi3)) [fasgf] ([x.n := y.n]^l, (U, i)) --> (U, i[n(xi,xi4) |-> tt]) if xi, xi2 \in U, i(x(xi)), i(y(xi2)), and i(n(xi2, xi4)) and there is no xi3 such that i(n(xi, xi3)) [mal] ([malloc x]^l, (U, i)) --> (U \cup {xi}, i[x(xi)]) if not(xi \in U) [fmal] ([malloc x.n]^l, (U, i)) --> (U \cup {xi}, i[n(xi2, xi)]) if not(xi \in U) and i(x(xi2)) ------------------------------------------ How would you generalize to arbitrary selectors? Is there a limit to the size of the 2-STRUCTS in these concrete rules? So are they suitable for doing an analysis? 2. predicate abstraction (section 2.3) ------------------------------------------ PREDICATE ABSTRACTION Let P^A be a set of nullary predicates of interest. P^A = {P_1, ..., P_m} where each P_j is defined by some phi_j: 2-STRUCT[P^C] -> Boolean An *abstract state* is a truth assignment A: {1, ..., m} -> Boolean An *abstraction mapping* beta: 2-STRUCT[P^C] -> 2-STRUCT[P^A] where beta(U,i) = (A, i') and i'(P_j) = phi_j(U,i) ------------------------------------------ Given a concrete state, can we decide on P_i's abstract value? ------------------------------------------ EXAMPLE Suppose we want to prove that the result of a program is a cyclic singly linked list. Idea: Design predicates useful in stating this property. Example set of nullary predicates: Let K > 0 be a fixed number. Let P^A = {NotNull[x] | x \in Var*} \cup {EqualsNext[k][x,y] | 0 <= k <= K and x, y \in Var*} Defining formulas: NotNull[x] = (exists v : v \neq <> : x(v)) EqualsNext[k][x,y] = (exists v_0, ..., v_k :: x(v_0) and y(v_k) and (forall j : 0 <= j <= k : n(v_j, v_{j+1}))) ------------------------------------------ How do these predicates help describe this property? Are these nullary predicates? Don't they take arguments? So what does EqualsNext[3][q,r] mean? How would you represent the set of predicates? What does it mean if a predicate P_i is true? What does it mean if NotNull[x] is false? What does it mean if EqualsNext[3][q,r] is false? ------------------------------------------ IMPLEMENTING THE ANALYSIS Idea: Use the defining formulas to induce the transfer functions In our example A is fixed (for a program), so we have configurations of the form (S,i') and i'. [asgnil] ([x := nil]^l, i') --> i'[NotNull[x] |-> ff, (forall k, y :: EqualsNext[k][x,y] |-> ff)] [asgv] ([x := y]^l, i') --> i'[NotNull[x] |-> b, EqualsNext[0][x,y] |-> tt] if b = i'(NotNull[y]) [asgf] ([x := y.n]^l, i') --> i'[NotNull[x] |-> (exists k, z :: EqualsNext[k][y,z]), (forall k, z : k < K : EqualsNext[k][x,z] |-> EqualsNext[k+1][y,z])] if i'(NotNull[y]) [fasgnil] ([x.n := nil]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> ff)] if i'(NotNull[x]) ------------------------------------------ How would you do the rest of these? How do these relate to the transfer functions?