Com S 641 Lecture -*- Outline -*-

* Shape Analysis (2.6)

  Back to ignoring procedures!

  But now we look at heap-allocated data

      Goals: finite approximations of data structures (shapes)
             useful for other software tools
              (e.g., null dereference detection)
             can be used for some kinds of verification of invariants
              (e.g., procedure doesn't introduce cycles into a list)

** syntax

For reference, which we've seen and don't need to do again:

  a \in AExp   "arithmetic expressions"
  b \in BExp   "Boolean expressions"
  S \in Stmt   "statements"

x,y \in Var    "variables"
  n \in Num    "numeric literals"
  l \in Lab    "labels"
opa \in Op_a   "arithmetic operators"
opb \in Op_b   "Boolean operators"
opr \in Op_r   "relational operators"

New and modified...

------------------------------------------
            SYNTAX

 sel \in Sel   "selector names"
   p \in PExp  "pointer expressions"
op_p \in Op_p  "pointer operators"

  p ::=           "pointer expression"
        x         "variable dereference"
      | x.sel     "field dereference"

  a ::=
        p         "dreference expression"
      | n
      | a1 opa a2 
      | nil

  b ::= op_p p    "pointer test"
      | true
      | false
      | not b
      | b1 opb b2
      | a1 opr a2

  S ::=
        [p:= a]^l       "assignment"
      | [malloc p]^l    "allocation"
      | [skip]^l
      | S1 ; S2
      | if [b]^l then S1 else S2
      | while [b]^l do S
      | assume [b]^l

Op_p = {is-nil, ...}
       \cup {has-sel | sel \in Sel}
------------------------------------------

        Don't allow pointer arithmetic, although it looks like it

        Q: Can we test for equality of pointers?
           Yes

------------------------------------------
         EXAMPLE (COPY-INTO)

assume [not (f = t)]^0;
while [(not is-nil(f))
        and (not is-nil(t))]^1
do
  ([t.val := f.val]^2;
   [t := t.next]^3;
   [f := f.next]^4);

------------------------------------------

** structural operational semantics (2.6.1)
*** domains
------------------------------------------
       OPERATIONAL SEMANTICS (2.6.1)

Domains:

 xi \in Loc                   "locations"
  s \in State = Var* -> Storable
        Storable = Z + Loc + {<>}
  h \in Heap = (Loc x Sel) ->fin Storable

------------------------------------------

      we assume that Loc is infinite

      <> is the nil value

      ->fin is for finite functions, and hence partial

      Var* is a finite set of variables (occurring in the program)

      Q:  How would you explain the model of the heap?

          describes the fields of an object at a given location

*** denotational semantics of expressions
------------------------------------------
     SEMANTICS OF POINTER EXPRESSIONS

P: PExp* -> (State x Heap) ->fin Storable

P[[x]](s, h) = s(x)

P[[x.sel]](s, h) =
   if s(x) \in Loc
       and (s(x), sel) \in dom(h)
   then h(s(x), sel)
   else undef
------------------------------------------

     PExp* is subset of PExp such that

        (\forall p \in PExp* :: FV(p) \subseteq Var*)

     Q:  Do we have to modify the semantics of arithmetic and Boolean
     expressions now?

     Yes, since pointers can be involved in the syntax,
          and since their semantics is partial and needs the heap...

------------------------------------------
    SEMANTICS OF ARITHMETIC EXPRESSIONS

A: AExp -> (State x Heap) ->fin Storable

A[[p]](s, h) = P[[p]](s, h)
A[[nil]](s, h) = <>
A[[n]](s, h) = N[[n]]
A[[a_1 op_a a_2]](s, h) = 
   (OP_a[[op_a]]) (A[[a_1]](s,h))
                  (A[[a_2]](s,h))
------------------------------------------

    Q:  What happens if P[[p]](s, h) is undefined?

        Then A[[p]](s, h) is undefined

    Q:  Are there any other cases where A is undefined?

        Yes...

    Q:  How do we prevent pointer arithmetic?

        making the meaning of the operators be undefined unless both
        arguments are integers (i.e., by changing OP_a[[.]])

------------------------------------------
    SEMANTICS OF BOOLEAN EXPRESSIONS

B: BExp -> (State x Heap) ->fin T

B[[a_1 op_r a_2]](s, h) = 
   (OP_r[[op_r]]) (A[[a_1]](s,h))
                  (A[[a_2]](s,h))
B[[op_p p]](s, h) = 
   (OP_p[[op_p]]) (P[[p]](s, h))

OP_p: Op_p -> Storable ->fin T

OP_p[[is-nil]](v) = if v = <> then tt
                    else ff
------------------------------------------

    Q:  What kinds of changes would be needed to OP_r, if any?

       Want to allow equality comparison of pointers,
       but not mixed comparisons between pointers and integers; 
       the mixed cases should be undefined.

*** operational semantics of statements
------------------------------------------
   OPERATIONAL SEMANTICS OF STATEMENTS

Configurations:

  (Stmt x State x Heap) + (State x Heap)

Terminal configurations:

  (State x Heap)
------------------------------------------

        Q:  How does that change of configurations affect the
        description of the transitions?

        have to add the heap everywhere...

------------------------------------------
           TRANSITIONS

[ass] ([x := a]^l, s, h)
         --> (s[x |-> A[[a]](s,h)], h)
    if A[[a]](s,h) is defined

[fass] ([x.sel := a]^l, s, h)
    --> (s, h[(s(x),sel) |-> A[[a]](s,h)])
    if s(x) \in Loc
    and A[[a]](s,h) is defined

[mal] ([malloc x]^l, s, h)
         --> (s[x |-> xi], h)
    if xi \in Loc does not occur in s or h

[fmal] ([malloc x.sel]^l, s, h)
         --> (s, h[(s(x), sel) |-> xi], h)
    if xi \in Loc does not occur in s or h
    and s(x) \in Loc
------------------------------------------

        Q: What happens if the side conditions are not met in [ass] or [fass]?

           the semantics is "stuck",
            because no transitions can take place,
           and since the configuration is not terminal,
            this indicates an error

        Q: Can that happen in [fmal]?
             yes
        Q: In [mal]?
             no, Loc is infinite

          xi is "fresh" in [mal] and [fmal]

        Q: What would the skip rule look like? seq1?

        [skip] ([skip]^l, s, h) --> (s, h)
        
                  (S1, s, h) --> (S1', s', h')
        [seq1] ----------------------------------
               (S1;S2, s, h) --> (S1';S2, s', h')

       Don't need to show the following...

               (S1, s, h) --> (s', h')
        [seq2] ----------------------------
               (S1;S2, s) --> (S2, s', h')

       Q: How does the partiality of the Boolean semantics affect the
          if rules?

        [if1] (if [b]^l then S1 else S2, s, h) --> (S1, s, h)
                               if B[[b]](s, h) = true
        
        [if2] (if [b]^l then S1 else S2, s, h) --> (S2, s, h)
                               if B[[b]](s, h) = false

         So if B[[b]](s, h) is undefined, the semantics is "stuck",
         which indicates an error.

         The same would happen with the "while" rules

         [wh1] (while [b]^l do S, s, h)
                  --> (S; while [b]^l do S, s, h)
                                if B[[b]](s,h) = true
         
         [wh2] (while [b]^l do S, s, h)
                  --> s
                                if B[[b]](s,h) = false

------------------------------------------
            EXAMPLE

What transitions happen for

  [malloc x]^1;
  [malloc x.next]^2;
  [y := x.next]^3;
  [y.next := x]^4;
------------------------------------------

** a shape analysis

   the choices made in this analysis should be thought of as an example,
   one could make other choices and trade precision for efficiency...

*** property space (2.6.2)

    the property space for the analysis is sets of compatible shape graphs

------------------------------------------
          SHAPE GRAPHS (2.6.2)

Goals:

 - faithfully model real state and heap

 - smaller (finite) representation

def: (S,H,is) is a *shape graph* for S*
iff:
 S \in AState = Powerset(Var* x ALoc)
 H \in AHeap = Powerset(ALoc x Sel x ALoc)
 is \in IsShared = Powerset(ALoc)
where
  ALoc = {n_Z | Z \subseteq Var*}.
------------------------------------------

      AState is a set of abstract states
      AHeap is a set of abstract heaps
      IsShared contains sharing information
      ALoc is a set of abstract locations

**** abstract locations     
------------------------------------------
           ABSTRACT LOCATIONS

n_{x,y} represents
  the location that x and y both point to
  in a given state (s(x) and s(y))

n_{} represents
  locations not pointed to by variables

------------------------------------------

    n_{} is the "abstract summary location"

------------------------------------------
            EXAMPLE

What are the abstract locations that exist
at the exit of each label in:

  [malloc x]^1;
  [malloc x.next]^2;
  [y := x.next]^3;
  [y.next := x]^4;


------------------------------------------

**** abstract states

------------------------------------------
         ABSTRACT STATES

S \in AState = Powerset(Var* x ALoc)

E.g., {(x, n_{x,z}), (z, n_{x,z}),
       (y, n_{y})}
  represents a state s such that
      s(x) = xi1
      s(z) = xi1
      s(y) = xi2

ALoc(S) = {n_X | (x, n_X) \in S}

           EXAMPLE

What is the abstract state
at the exit of each label in:

  [malloc x]^1;
  [malloc x.next]^2;
  [y := x.next]^3;
  [y.next := x]^4;


------------------------------------------
all
     Draw a picture of this example state and the abstract state.

     Q: What about variables that map to integers?

        They are ignored, as are variables that map to nil.

        So if we add [z := 641]^5 to the program,
         it doesn't change the abstract state

**** abstract heaps

------------------------------------------
          ABSTRACT HEAPS

H \in AHeap = Powerset(ALoc x Sel x ALoc)

E.g., {(n_{x,z}, next, n_{y}),
       (n_{y}, next, n_{}),
       (n_{}, next, n_{})}
  represents a heap h and state s 
  such that:
     h(s(x), next) = s(y)
     h(s(z), next) = s(y)
     h(s(y), next) = xi
     h(xi, next) = xi or xi2

ALoc(H) = {n_V | (n_V, sel, n_W) \in H}
        \cup {n_W | (n_V, sel, n_W) \in H}

           EXAMPLE

What is the abstract state and heap
at the exit of each label in:

  [malloc x]^1;
  [malloc x.next]^2;
  [y := x.next]^3;
  [y.next := x]^4;


------------------------------------------

    Draw a picture for the example.

    Q: What do we know about the abstract summary location, in general?
       not much, could represent lots of locations lumped together

**** sharing information

------------------------------------------
         SHARING INFORMATION

is \in IsShared = Powerset(ALoc)

{n_{x,z}} means that
   the heap has more
   than one pointer to s(x) or s(y)

          EXAMPLE

What is the sharing information
at the exit of each label in:

  [malloc x]^1;
  [malloc x.next]^2;
  [y := x.next]^3;
  [y.next := x]^4;
  [z := y.next]^5;
  [z.next := x]^6;
  [z.back := y]^7;


------------------------------------------

        Draw a picture of the state, heap, and the abstract state and heap
         for the example 

**** invariants

------------------------------------------
         INVARIANTS ON SHAPE GRAPHS

def: A shape graph (S,H,is)
     is *compatible* iff

 1. (\forall n_V, n_W \in 
          ALoc(S) \cup ALoc(H) \cup is ::
       V = W or (V \cap W = {}))

 2. (\forall (x, n_X) \in S :: x \in X)

 3. (\forall (n_V, sel, n_W) in H ::
     (\forall (n_V, sel, n_W') in H ::
       V = {} or W = W'))

 4. (\forall n_X \in is :: 
      (\exists sel ::
                  (n_{}, sel, n_X) \in H)
     or
      (\exists (n_V, sel1, n_X) \in H ::
       (\exists (n_W, sel2, n_X) \in H ::
          sel1 != sel2 or V != W)))

 5. (\forall (n_V, sel1, n_X) \in H ::
       (\forall (n_W, sel2, n_X) \in H ::
          ((sel1 != sel2 or V != W)
           and X != {})
          ==> n_X \in is))
------------------------------------------

      Q:  Why are these sensible?

      1. enforces the interpretation of abstract locations,
         without it, a location could participate in multiple
         different abstract locations

      2. also enforces the interpretation of abstract locations,
         without it, some locations pointed to by variables might not
         correspond to any abstract location in the abstract state

      3. enforces the interpretation of the abstract heap,
         at least for locations pointed to directly by variables,
         without it, some heap edges might not be represented in the
         abstract heap

      4 and 5. enforce the interpretation of the sharing information

**** actual property space

------------------------------------------
         PROPERTY SPACE IS
    SETS OF COMPATIBLE SHAPE GRAPHS

Assume a fixed/given program S*.

Compatible shape graphs:

 SG = {(S,H,is) | (S,H,is) is compatible}

Property Space

 L = Powerset(SG)
 \bigsqcup = \bigcup

So
 \sqsubseteq = \subseteq

------------------------------------------

     Q:  Is this property space finite?
       Yes, because Var* is finite, the other parts are.

*** instance of a monotone framework (2.6.3)

------------------------------------------
          MONOTONE FRAMEWORK INSTANCE

Assume a fixed/given program S*.

Montone framework instance:

  L = Powerset(SG)
  Funs = {f : L -> L | f is monotone}
  F = flow(S*)
  E = {init(S*)}
  i = some set {(S_0, H_0, is_0)}
  f^SA_. described below
------------------------------------------

        Q:  Is this a forward analysis?
        yes

        Q:  Is this a may or a must analysis?
            may,

             but because of the invariants on shape graphs, individual
             shape graphs cannot contain superfluous information

             can be used to achieve *strong update* and *strong nullification*

                i.e., can remove old analysis information before
                adding new information

**** equations
------------------------------------------
          RESULTING EQUATIONS

Shape_o(l) =
    if l = init(S*) then i
    else \bigcup {Shape_.(l')
                  | (l',l) \in flow(S*)}

Shape_.(l) = f^SA_l(Shape_o(l))
------------------------------------------

        Q:  Which of these is the entry information?
            Shape_o(l)

**** transfer functions

------------------------------------------
         TRANSFER FUNCTIONS

General form:

 f^SA_l(SG) =
    \bigcup {phi^SA_l((S,H,is))
                  | (S,H,is) \in SG}
------------------------------------------

        Q:  What is this doing?
            it maps the function phi^SA_l over all shape graphs in the
            argument set, SG, and complains the results.

------------------------------------------
           TESTS AND SKIP

If l labels [b]^l,
then

  phi^SA_l((S,H,is)) = {(S,H,is)}

If l labels [skip]^l,
then

  phi^SA_l((S,H,is)) = {(S,H,is)}
------------------------------------------

------------------------------------------
   MANIPULATIONS USEFUL FOR ASSIGNMENTS

Taking a variable x out of an ALoc:

  k_x(n_Z) = n_Y, where Y = Z\{x}

Taking variable x out of a shape graph:

  kill_x((S,H,is)) = (S',H',is')
    where
       S' = {(z, k_x(n_Z))
                 | (z, n_Z) \in S, z != x}

       H' = {(k_x(n_V), sel, k_x(n_W))
                 | (n_V, sel, n_W) \in H}

       is' = {k_x(n_X) | n_X \in is}

Taking a field x.sel out of a shape graph:

  kill_x.sel((S,H,is)) = (S',H',is')
    where
      S' = S

      H' = {(n_V, sel', n_W))
                | (n_V, sel', n_W) \in H,
                  n_X \in ALoc(S),
                  (X != V or sel' != sel)}

      is' = restore(is, H')

      restore(is, H') =
        { n_V | n_V \in is,
               #into(n_V, H') > 1
            or (\exists sel'::
               (n_{}, sel', n_V) \in H') }

      #into(n_V, H) = number of pointers
                      to n_V in H

Adding a binding from x to y's binding:

  g^y_x(n_Z) =
      if y \in Z
      then n_W where W = Z \cup {x}
      else n_Z

Adding a binding from x to y's binding
in a Shape graph:

  glom^y_x((S,H,is)) = {(S'',H'',is'')}
   where

    (S',H',is') = kill_x((S,H,is))

    S'' =
       {(z,g^y_x(n_Z)) | (z, n_Z) \in S'}
      \cup
       {(x,g^y_x(n_Y)) | (y, n_Y) \in S'}

    H'' =
       {(g^y_x(n_V), sel, g^y_x(n_W)) 
          | (n_V, sel, n_W) \in H'}

    is'' = {g^y_x(n_Z) | n_Z \in is'}

Adding a binding from x to n_U:

  h^U_x(n_Z) =
       if Z = U
       then n_Y where Y = U \cup {x}
       else n_Z

Adding a binding from x to n_U
in a Shape graph:

  hbind^U_x((S,H,is)) = {(S'',H'',is'')}
   where

    (S',H',is') = kill_x((S,H,is))

    S'' =
       {(z,h^U_x(n_Z)) | (z, n_Z) \in S'}
      \cup {(x,h^U_x(n_U))}

    H'' =
       {(h^U_x(n_V), sel, h^U_x(n_W)) 
          | (n_V, sel, n_W) \in H'}

    is'' = {h^U_x(n_Z) | n_Z \in is'}

Adding a binding from x.sel to n_Y
in a Shape graph:

  hsbind(x,n_X,n_Y)((S,H,is))
                   = {(S'',H'',is'')}
   where

    (S',H',is') = kill_x.sel((S,H,is))

    S'' = S'

    H'' = H' \cup {(n_X, sel, n_Y)}

    is'' = is 
           \cup {n_Y | #into(n_Y,H') >= 1}

------------------------------------------

    Note that kill_x, glom^y_x, and hbind^U_x
     are essentially mapping k_x and g^y_x (respectively)
     over their shape graph argument.

    Q: What is the type of each of these?

    Q: What is kill_y((S,H,is)), where
          S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})}
          H = {(n_{y,z}, next, n_{}),
               (n_{x}, next, n_{y,z}),
               (n_{}, next, n_{y,z}),
               (n_{}, next, n_{})}
          is = {n_{y,z}}
       ?

    Q: What is glom^y_x((S,H,is)), where
          S = {(y,n_{y,z}), (z,n_{y,z}), (x,n_{x})}
          H = {(n_{y,z}, next, n_{}),
               (n_{x}, next, n_{y,z}),
               (n_{}, next, n_{y,z}),
               (n_{}, next, n_{})}
          is = {n_{y,z}}
       ?

------------------------------------------
            ASSIGNMENTS

If l labels [x := a]^l,
then

 if a is of form n, a_1 op_a a_2, or nil,
 then

   phi^SA_l((S,H,is)) = {kill_x((S,H,is))}

 elseif a is of the form x (i.e., x := x),
 then

   phi^SA_l((S,H,is)) = {(S,H,is)}

 elseif a is of the form y, where y != x,
 then

   phi^SA_l((S,H,is)) = glom^y_x((S,H,is))

 elseif a is of the form x.sel
                      (i.e., x := x.sel),
 then

   phi^SA_l((S,H,is)) = (see text/homework)

 else (a is of the form y.sel,
                    where y != x)

  let (S',H',is') = kill_x((S,H,is))
  in

   if there is no n_Y
        such that (y,n_Y) \in S'
   then
     phi^SA_l((S,H,is)) =
                       {kill_x((S,H,is))}

   elseif there is some n_Y such that
                           (y,n_Y) \in S'
          and there is a n_U != n_{}
              such that
                    (n_Y, sel, n_U) \in H'
   then
     phi^SA_l((S,H,is)) =
                       hbind^U_x((S,H,is))

   else (there is some n_Y such that
                           (y,n_Y) \in S'
         and (n_Y, sel, n_{}) \in H')
     phi^SA_l((S,H,is)) =
      {(S'',H'',is'') |
       (S'',H'',is'') is compatible,
       kill_x((S'',H'',is'')
          = kill_x(S,H,is),
       (x,n_{x}) \in S'',
       (n_Y, sel, n_{x}) \in H''}

  end
------------------------------------------

     Q: When can all these cases occur?

     Discuss example 2.49, 2.52, see the figures.

------------------------------------------
        ASSIGNMENTS TO FIELDS

If l labels [x.sel := a]^l,
then

 if a is of form n, a_1 op_a a_2, or nil,
 then

   if there is no n_X
           such that (x,n_X) \in S
   then
     phi^SA_l((S,H,is)) = {(S,H,is)}

   elseif there is some n_X
           such that (x,n_X) \in S
          and there is no n_U such that
           (n_X, sel, n_U) \in H
   then
     phi^SA_l((S,H,is)) = {(S,H,is)}

   else (there is some n_X
           such that (x,n_X) \in S
          and there is some n_U such that
           (n_X, sel, n_U) \in H)
     phi^SA_l((S,H,is)) = 
            {kill_x.sel(S,H,is)}

 elseif a is of the form x 
                     (i.e., x.sel := x),
 then

   phi^SA_l((S,H,is)) =(see text/homework)

 elseif a is of the form y, where y != x,
 then

   if there is no n_X such that
              (x, n_X) \in S
   then
     phi^SA_l((S,H,is)) = {(S,H,is)}

   elseif there is some n_X such that
              (x, n_X) \in S
         and there is no n_Y such that
              (y, n_Y) \in S
   then
     phi^SA_l((S,H,is)) = 
            {kill_x.sel(S,H,is)}

   else (there is some n_X such that
              (x, n_X) \in S
         and there is some n_Y such that
              (y, n_Y) \in S)
     phi^SA_l((S,H,is)) =
           hsbind(x,n_X,n_Y)((S,H,is))

 else (a is of the form y.sel)

   phi^SA_l((S,H,is)) = (see text/homework)

------------------------------------------

     Q: When can all these cases occur?

     Discuss example 2.54, figures 2.21 and 2.22

------------------------------------------
            MALLOC

If l labels [malloc x]^l,
then

  phi^SA_l((S,H,is)) = 
         P(S \cup {(x, n_{x})}, H', is')
  where 
    (S',H',is') = kill_x(S,H,is)

Elseif l labels [malloc x.sel]^l,
then

  phi^SA_l((S,H,is)) = (see text/homework)

------------------------------------------

        ... and we collapse in a heap :-)