Com S 641 meeting -*- Outline -*-

* Abstract Interpretation (1.5)

** goals

   To specify an analysis in a way that is correct by construction.
   To calculate an analysis in a systematic way.

** idea

   Q: What's the basic idea?
      1.  Write the semantics that collects (more than) the information
          required.
      2.  Define an abstraction function (a) that extracts the
          information needed to answer the question of the analysis
      3.  Calculate the analysis

   Q:  What's a collecting semantics?

          A mathematical semantics of the program that records all the
          necessary information for the analysis.

   Q:  How is that used to extract the analysis?

          via a Galois connection...

** example
------------------------------------------
             EXAMPLE

  [q := 0]^1;
  [r := x]^2;
  while [r >= y]^3
  do ([r := r-y]^4;
     [q := q+1]^5);
  assert [0<=r and r<y and q*y+r == x]^6

For assigned variables analysis,
we seek sets of variables
that may have been assigned up to a
program point.
------------------------------------------

     Q: What is a collecting semantics?
        a semantics of programs that collects information from
        different possible runs.
     Q: What would a collecting semantics look like for this example?
        collect sets of traces, which are sequences of (Var, Lab) pairs,
        each indicating a possible execution that made the assignments
        at those labels in that order.

  See page 14 for one example, but not the one above

Trace = (Var x Lab)*

CSexit(1) = {tr : (q,1) | tr \in CSentry(1)}
CSexit(2) = {tr : (r,2) | tr \in CSentry(2)}
CSexit(3) = CSentry(3)
CSexit(4) = {tr : (r,4) | tr \in CSentry(4)}
CSexit(5) = {tr : (q,5) | tr \in CSentry(5)}
CSexit(6) = CSentry(6)

CSentry(1) = {(q,?),(r,?),(x,?),(y,?)}
CSentry(2) = CSexit(1)
CSentry(3) = CSexit(2) \cup CSexit(5)
CSentry(4) = CSexit(3)
CSentry(5) = CSexit(4)
CSentry(6) = CSexit(3)

     Q: Why is CSentry(1) = {(q,?),(r,?),(x,?),(y,?)} ?

     Q:  How is this different than the reading definitions analysis?
         it's sets of ordered traces, as opposed to sets of pairs.

     Q:  How would you handle if then else statements?
     Q:  For loops?
     Q:  How would you handle break?

*** solving the equations

------------------------------------------
       SOLVING THE EQUATIONS

 Trace = (Var x Lab)*

G: (Powerset(Trace))^{12}
    -> (Powerset(Trace))^{12}

G is defined by:

 G(CS_1, ..., CS_{12})
   = (G_1(CS_1, ..., CS_{12}),
      G_2(CS_1, ..., CS_{12}),
      ...,
      G_{12}(CS_1, ..., CS_{12}))

where
  G_1(CS_1, ..., CS_{12})
        = {((q,?),(r,?),(x,?),(y,?))}

  G_2(CS_1, ..., CS_{12})
        = {tr : (q,1) | tr \in CS_1}

  G_3(CS_1, ..., CS_{12})
        = CS_2
  ...


Solution

(CSentry(1), CSexit(1), CSentry(2),
       ..., CSentry(6), CSexit(6))

 is a solution if

  G(CSentry(1), CSexit(1), CSentry(2),
    ..., CSentry(6), CSexit(6))
 = (CSentry(1), CSexit(1), CSentry(2),
     ..., CSentry(6), CSexit(6))

------------------------------------------

   Q:  What is G_1?
        the constant {((q,?),(r,?),(x,?),(y,?))}
   Q:  Why does it have 12 parameters?
        Because all of the G_i do, it doesn't use any of them.
   Q:  What is G_2?
        CSexit(1)
   Q:  Why is G_3 = CS_2 in this example?
        because the semantics says that CSentry(2) = CSexit(1) in this example

    so we have G(\vec{CS}) = \vec{CS},
    or we can represent the system of equations as \vec{CS} = G(\vec{CS}).

   Q:  What's the ordering on the solution space?
       pointwise for the vector, subsets for sets of traces

   Q:  What does it mean for G to be monotone?
       that means if it's given a bigger tuple of set of traces,
       then it returns one that is no smaller.

   So by the theory one can find a least solution, the least fixed
   point of G, lfp(G).

*** Galois connections

------------------------------------------
     ABSTRACTION AND CONCRETIZATION

abstraction function for AV analysis:

   a: Powerset(Trace) -> Powerset(Var)
   a(X) = { x | (x,l) \in tr, l != ?,
                tr \in X }

concretization function for AV analysis:

  g: Powerset(Var) -> Powerset(Trace)
  g(Y) = { tr | (\forall x \in Y ::
                 (\exists l \in Lab* ::
                   l != ?, (x,l) \in tr))}

Adjunction, or Galois connection:

   a(X) \subseteq Y <==> X \subseteq g(Y)
------------------------------------------

    Note: a and g are \alpha and \gamma
          in the book

    Q: Would a and g be different for the RD analysis?
       yes

    draw a picture like figure 1.3

------------------------------------------

   set of traces          set of vars
 |---------------|     |---------------|
 |               |  g  |               |
 |     g(Y) <---------------- Y        |
 |      U|       |     |      U|       |
 |      X  -----------------> a(X)     |
 |               |  a  |               |
 |_______________|     |_______________|

------------------------------------------


*** calculating the analysis

    Q:  Why do we care about the abstraction and concretization
    functions?

------------------------------------------
      CALCULATING THE ANALYSIS

Extend a and g pointwise to tuples:

 a(X_1, ..., X_12)
    = (a(X_1), ..., a(X_12))

 g(Y_1, ..., Y_12)
    = (g(Y_1), ..., g(Y_12))


Define the AV analysis by the function

  a o G o g: Powerset(Var)^12
                  -> Powerset(Var)^12

so for each i in {1..12}

 (a o G_i o g): Powerset(Var)^12
                       -> Powerset(Var)

by

 a(G_1(g(AV_1, ..., AV_12)))
      = a({(q,?),(r,?),(x,?),(y,?)})
      = {}

 a(G_2(g(AV_1, ..., AV_12)))
      = a({tr : (q,1) | tr \in g(AV_1)})

So a solution

 (AVentry(1), ..., AVexit(6))

has the property that

(AVentry(1), ..., AVexit(6))
 = (a o G o g)(AVentry(1), ..., AVexit(6))

------------------------------------------

    I'm abusing notation above, but the book uses vector notation for
    the extensions

    Q: What's AV_3?

    Q:  How does this compare to the analysis we created by hand?

    Q:  What's the benefit of doing things this way?
        - we know there's a least solution,
           lfp(a o G o g)
        - we get correctness of the analysis,
          from correctness of the collecting semantics

           a(lfp(G)) is safely approximated by lfp(a o G o g)

           when these are equal, we said the analysis is optimal.