COP 5021 Lecture -*- Outline -*-

* Intraprocedural Analysis (2.1)

  Several examples of data flow analysis for the WHILE language.
    Goals: - learn how to precisely define them
           = learn how to implement them

  Analyses defined by pairs of entry/exit functions,
   that map labels to analysis information (sets).

  The analysis operates on the control flow graph of the program...

** definitions and notation

   The point of these is to represent the control flow graph.
   These are all defined by structural induction.

*** initial and final labels

------------------------------------------
       INITIAL LABEL

init: Stmt -> Lab

  init([x := a]^l) = l
  init([skip]^l) = l
  init(S1; S2) =
  init(if [b]^l then S1 else S2) =
  init (while [b]^l do S) =

------------------------------------------

        ... init(S1)
            l
            l

     Q:  What would the initial label of a nondeterministic choice
         statement be?  Of a parallel composition statement?

     Q:  How would we generalize the formalism to handle such statements?
         perhaps return a set of labels.

------------------------------------------
      FINAL LABELS

final: Stmt -> Powerset(Lab)

  final([x := a]^l) = {l}
  final([skip]^l) = {l}
  final(S1; S2) =
  final(if [b]^l then S1 else S2) =
  final (while [b]^l do S) =

------------------------------------------

        ... final(S2)
            final(S1) \cup final(S2)
            l

     Q:  What would the final label set of a nondeterministic choice
         statement be?  Of a parallel composition statement?
           the final set on each branch

*** blocks and labels

------------------------------------------
      ELEMENTARY BLOCKS

blocks: Stmt -> Powerset(Block)


------------------------------------------
     Q:  How would you define the set of elementary blocks in a statement?

            blocks([x := a]^l) = {[x := a]^l}
            blocks([skip]^l) = {[skip]^l}
            blocks(S1; S2) = blocks(S1) \cup blocks(S2)
            blocks(if [b]^l then S1 else S2)
                  = {[b]^l} \cup blocks(S1) \cup blocks(S2)
            blocks(while [b]^l do S) = {[b]^l} \cup blocks(S)

     Q:  What would the set of elementary blocks in an assert statement be?
     Q:  What would be set of elementary blocks in a nondeterministic choice
          statement be?

     Q:  How would you define the set of labels in a statement?

         labels: Stmt -> Powerset(Lab)
         labels(S) = {l | [B]^l \in blocks(S)}

*** flows and reverse flows

------------------------------------------
      FLOWS

flow: Stmt -> Powerset(Lab x Lab)

  flow([x := a]^l) = {}
  flow([skip]^l) = {}
  flow(S1; S2) =

  flow(if [b]^l then S1 else S2) =

  flow(while [b]^l do S) =

------------------------------------------
        ... // flow(S1; S2) =
            flow(S1) \cup flow(S2)
            \cup {(l, init(S2))| l \in final(S1)}

            // flow(if [b]^l then S1 else S2) =
            flow(S1) \cup flow(S2)
            \cup {(l,init(S1)} \cup {(l,init(S2)}

            // flow(while [b]^l do S) =
            flow(S) \cup {(l, init(S))}
            \cup {(l', l)| l' \in final(S)}

   Q:  How would we use these functions to represent the nodes and
   edges of a control flow graph?

   Q: What are the nodes of the control flow graph?
      The elementary blocks

   Q: What are labels and edges of
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?
         The set of labels is {1, 2, 3}
         The set of flows is {(1,2), (1,3)}

   Q:  How you formulate a set of reverse flows?

       flow^R: Stmt -> Powerset(Lab xLab)
       flow^R(S) = {(l', l) | (l, l') \in flow(S)}

   Q: What is flow^R of
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?
      If this statement is S, then
        flow(S) = {(1,2),(1,3)} so flow^R(S) = {(2,1),(3,1)}

   Q: What is flow^R of
         (while [x <= 4]^1 do [x := x+1]^2); [assert x == 4]^3
      ?
      If this statement is S, then
        flow(S) = {(1,2),(1,3),(2,1)} so flow^R(S) = {(2,1),(3,1),(1,2)}

   Q: What are the initial nodes of flow^R?
      the final nodes of flow

*** program of interest

    In the book, the stars are subscripts,
    but the ascii version here uses them in line S* vs. S
                                                         *
------------------------------------------
           PROGRAM OF INTEREST

 S* = the top level statement

 Lab* = labels(S*)

 Var* = FV(S*)

 Blocks* = blocks(S*)

------------------------------------------
     These are the: labels, variables, and elementary blocks in the
     program S*

------------------------------------------
      ISOLATED ENTRIES AND EXITS

def: S* has isolated entries iff


def: S* has isolated exits iff


------------------------------------------
        ... (\forall l \in Lab ::
               (l, init(S*)) \not\in flow(S*))

        ... (\forall l1 \in final(S*) ::
               (\forall l2 \in Lab ::
                   (l1, l2) \not\in flow(S*)))

   Q: What kind of programs would not have isolated entries?
      Those that do not start with a while-loop.

   Q: What kind of programs would not have isolated exits?
      Those that do not end with a while-loop.

   Q: Could we convert programs to have both isolated entries and exits?
      Yes, add skip statements (to the beginning and end) as necessary
      So no loss of generality in assuming these

------------------------------------------
           LABEL CONSISTENT

def: S is label consistent if and only if
     no two blocks in S


Formally:


------------------------------------------
     ... have the same label.

   Q:  How would you formalize that?

       [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2

   Q:  Is there any reason not to have label consistent programs?
       No, but it might be easier to not have it if joining 2 programs together

** Example analyses

   Now we're going to look at 4 different analyses,
   to illustrate the formalism and to see how to precisely define
   dataflow analyses.

*** Available Expressions (AE) analysis (2.1.1)

**** definitions: trivial and non-trivial expressions
      Q:  What's a trivial expression?

------------------------------------------
       SUBEXPRESSIONS

def: An expression is *trivial* iff


 Aexp(a) = non-trivial arithmetic
           subexpressions of expression, a

E.g., Aexp(-x + -y) =


 Aexp* = nontrivial arithmetic expressions
             in S*

------------------------------------------
        ... is a single variable or a constant, in other words, a base case

        So, a nontrivial expression is an expression that is not trivial
           Note that this includes the entire expression
             (not proper subsexpression, just subexpressions)

        ... Aexp(-x + -y) = {-x, -y, -x+-y}

**** idea, goal
------------------------------------------
    AVAILABLE EXPRESSIONS (AE) ANALYSIS

Analysis question (p. 37):

 "For each program point,
  which [non-trivial] expressions
  must have already been
  computed, and not later modified,
  on all paths to that program point."

Example:

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do {  [t := a+k]^3;
         [j := j+1]^4;
         [k := i*j-1]^5  }
------------------------------------------
    Q: Why would a compiler writer care about non-trivial expressions?
       Can optimize if don't need to recompute them,
         so reusing them can save computation time

    Q: Why would trivial expressions not matter?
       Because they can be loaded into a register quickly,
         without any substantial computation

    Q: What non-trivial expressions are available at entry to block 2?
       i*j and (i*j)-1
         Note that these are available on all paths,
           even going around the loop

**** observations
    Q: What would be the bad outcome, which should be avoided?
         saying that some non-trivial expression is available when it
         might not be,
           i.e., taking its value out of a register when it is out of date

    Q: What makes a "solution" unsafe?
       too much in the set
    Q: What makes it imprecise?
       too little in the set

    Q: This is a forward analysis, Why?
       because the computation happens forward, along paths
         so use flow, not flow^R

    Q: Note the use of the word "must", what impact does that have?
        makes us use: - intersections of entry information,
                      - want the largest set that is a solution (fixed point),
                         because smaller sets are imprecise

**** formalization

    The basic idea is to define functions on each kind of elementary block,
    using two sub-functions: gen and kill...

------------------------------------------
        FORMAL DEFINITION

AEentry(l) =
   if l = init(S*) then {}
   else \bigcap {AEexit(l')
                   | (l',l) \in flow(S*)}

AEexit(l) =
(AEentry(l) - killAE(B^l)) \cup genAE(B^l)
   where B^l \in blocks(S*)


killAE: Blocks* -> Powerset(Aexp*)

killAE([x:= a]^l) =

killAE([skip]^l) = {}
killAE([b]^l) = {}


genAE: Blocks* -> Powerset(Aexp*)

genAE([x:= a]^l) =

genAE([skip]^l) = {}
genAE([b]^l) = 
------------------------------------------
    Q: What does the kill function mean?
       It says what expressions are no longer available,
       since they were assigned (what to take out of the analysis).

    Q: What does the gen function mean?
       It says what expressions become available.

     ... killAE([x:= a]^l) =
            {a' \in Aexp* | x \in FV(a')}
         (this is the set of nontrivial expressions that contain the
           variable being assigned as a free variable)

     ... genAE([x:= a]^l) =
            {a' \in Aexp(a) | x \not\in FV(a')}
         (so this is the non-trivial expressions in the right hand
            side that do not contain the assigned variable as a free variable)

     ... genAE([b]^l) = Aexp(b)

    Q: Why don't we have to define the analysis for while loops and if
       statements?
       it's implicit, using the CFG

**** example revisited
------------------------------------------
        EXAMPLE

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do ([t := a+k]^3;
       [j := j+1]^4;
       [k := i*j-1]^5)

What's the control flow graph?


------------------------------------------
   ...
         [k := i*j-1]^1
             |
             v
   /---->[i*j-1 < n]^2 ---------\
   |         |                  |
   |         v                  |
   |     [t := a+k]^3;          |
   |         |                  |
   |         v                  |
   |     [j := j+1]^4;          |
   |         |                  |
   |         v                  |
   |     [k := i*j-1]^5         |
   |         |                  |
   \---------/                  |
                                v
    Q: What is Aexp* in this example?
         {i*j, i*j-1, a+k, j+1}

    Q: What's the role of the control flow graph here?
       it's an implicit parameter,
       so these equations generate particular sets for every control flow
       graph (program).

------------------------------------------
       KILL AND GEN FOR THE EXAMPLE

What are killAE and genAE in the example?

l  killAE(l)      genAE(l)
============================
1
2
3
4
5
------------------------------------------

   ...
       l  killAE(l)        genAE(l)
       ================================
       1  {a+k}            {i*j, i*j-1}
       2  {}               {i*j, i*j-1}
       3  {}               {a+k}
       4  {i*j, i*j-1,j+1} {}                  
       5  {a+k}            {i*j, i*j-1}

    (note that j+1 is not generated at block 4)

    Q: Do the kill and gen functions depend on what happens at other nodes?
        No; this allows us to compute them just seeing the bolck itself

------------------------------------------
        EXAMPLE'S AE EQUATIONS

AEentry(1) = 
AEentry(2) =
AEentry(3) =
AEentry(4) =
AEentry(5) =

AEexit(1) =

AEexit(2) =

AEexit(3) =

AEexit(4) =

AEexit(5) =

------------------------------------------
    ...
        AEentry(1) = {}
        AEentry(2) = AEexit(1) \cap AEexit(5)
        AEentry(3) = AEexit(2)
        AEentry(4) = AEexit(3)
        AEentry(5) = AEexit(4)
        
        AEexit(1) = ({} - {a+k}) \cup {i*j, i*j-1}
        AEexit(2) = (Aentry(2) - killAE([i*j-1 < n]^2)
                    \cup genAE([i*j-1 < n]^2)
                  = (Aentry(2) - {}) \cup {i*j, i*j-1}
        AEexit(3) = (AEentry(3) - killAE([t:=a+k]^3)) \cup genAE([t:=a+k]^3)
                  = (AEentry(3) - {}) \cup {a+k}
        AEexit(4) = (AEentry(4) - killAE([j:=j+1]^4)) \cup genAE([j:=j+1]^4)
                  = (AEentry(4) - {i*j, i*j-1, j+1}) \cup {}
        AEexit(5) = (AEentry(5) - killAE([k := i*j-1]^5))
                    \cup genAE([k := i*j-1]^5)
                  = (AEentry(5) - {a+k}) \cup {i*j, i*j-1}

    Q: What are we assuming with this formalism?
       That the program is label consistent.  Why?
            because of the use of blocks(S*)
       That the program has isolated entries.  Why?
            because of the use of l == init(S*)

    Q: How would we adjust this if we didn't have isolated entries?
       (1) put a skip at the beginning of the program.
       (2) simulate the effect of that, by intersecting {} with other exits

    Q: What's the bad outcome we want to avoid?
         That there is an expression that is said to be available,
         but is not on all paths to a program point
           (because the compiler might try to
            get it from a register incorrectly)
    Q: What makes the analysis information more imprecise?
         a smaller set
    Q: What makes the analysis unsafe?
         a set that is too large.

    Q: So what do we want for a solution the largest or smallest?
         the largest solution, as that is the most precise

***** Finding a Solution using Chaotic Iteration (1.7)

    Q: How can we get rid of the recursion in the dataflow equations?
        Make it an iteration by using the generating functional (see below)

    Q: What sets do use to start iterating from to find a solution?
       Aexp*, because this is a must (intersection) analysis
              and (thus) we want the largest solution

------------------------------------------
 FORMALIZING THE ANALYSIS AS AN ITERATION

Let \vec{AE} be (AE1, AE2, ..., AE10)
   where:
    AE1 is the entry info for label 1
        from the previous iteration
    AE2 is the exit info for label 1
        from the previous iteration
    AE3 is entry info. for label 2
        from the previous iteration
    AE4 is exit info for label 2
        from the previous iteration
    AE5 is entry info for label 3
    AE6 is exit info for label 3
    ...
    AE10 is exit info for label 5
    
define F(v) = (F1(v), F2(v), ..., F10(v))
  where:
    F1(AE1, AE2, ..., AE10) // entry(1)
     = {}                   
    F2(AE1, AE2, ..., AE10) // exit(1)
     = (AE1 - killAE([k:=i*j-1]^1))
       \cup genAE([k:=i*j-1]^1)
     = (AE1 - {a+k}) \cup {i*j,i*j-1}
    F3(AE1, AE2, ..., AE10) // entry(2)
     = AE2 \cap AE10
    F4(AE1, AE2, ..., AE10) // exit(2)
     = (AE3 - killAE([i*j-1 < n]^2))
       \cup genAE([i*j-1 < n]^2)
     = (AE3 - {}) \cup {i*j,i*j-1}
    F5(AE1, AE2, ..., AE10) // entry(3)
     = AE4
    F6(AE1, AE2, ..., AE10) // exit(3)
     = (AE5 - killAE([t := a+k]^3))
       \cup genAE([t := a+k]^3)
     = (AE5 - {}) \cup {a+k}
    F7(AE1, AE2, ..., AE10) // entry(4)
     = AE6
    F8(AE1, AE2, ..., AE10) // exit(4)
     = (AE7 - killAE([j := j+1]^4))
       \cup genAE([j := j+1]^4)
     = (AE7 - {i*j,i*j-1,j+1}) \cup {}
    F9(AE1, AE2, ..., AE10) // entry(5)
     = AE8
    F10(AE1, AE2, ..., AE10) // exit(5)
     = (AE9 - killAE([k := i*j-1]^5))
       \cup genAE([k := i*j-1]^5)
     = (AE9 - {a+k}) \cup {i*j, i*j-1}

Then a fixed-point of F is a solution.
------------------------------------------

------------------------------------------
         CORRECTNESS OF ITERATION

Def 0 (\sqsubseteq):
  tuples of available expressions are
  ordered pointwise by superset inclusion:
i.e.,
 (v1,...,v10) \sqsubseteq (v1',...,v10')
   iff v1 \supseteq v1' and ...
       and v10 \supseteq v10'

Exercise 1: F is monotonic
       (in \sqsubseteq):
 i.e., if v \sqsubseteq v'
       then F(v) \sqsubseteq F(v').

Corollary 2:
Since (Aexp*,...,Aexp*)
      \sqsubseteq (Aexp*,...,Aexp*),
    F(Aexp*,...,Aexp*)
      \sqsubseteq F(Aexp*,...,Aexp*)

Corollary 3:
    (Aexp*,...,Aexp*)
      \sqsubseteq F(Aexp*,...,Aexp*)

Exercise 4: For all natural numbers n:
    (Aexp*,...,Aexp*)
      \sqsubseteq F^n(Aexp*,...,Aexp*)

Exercise 5: there is some natural number n
    such that 
      F^n(Aexp*,...,Aexp*)
       = F(F^n(Aexp*,...,Aexp*))
    (so that F^n(Aexp*,...,Aexp*) is a
     fixed point of F).
------------------------------------------
      Q: Why do these hold?
        Ex 1 can be checked directly on the definition of F.
        Corr. 2 and 3 follow immediately.
        Ex. 4 is proven by induction, using monotonicity of F.
        Ex. 5 holds becuase the lattice is finite,
              so there cannot be an infinite ascending chain 

------------------------------------------
       USING THE ITERATION DIRECTLY

Iterate F, starting with
  (Aexp*, Aexp*, ..., Aexp*)
 at each iteration get a better
   approximation (smaller set) until reach
   a fixed point

First iteration:
F(Aexp*,Aexp*,...,Aexp*)
 = (F1(Aexp*), F2(Aexp*), ..., F10(Aexp*))

F1(Aexp*,Aexp*,...,Aexp*)
 = {}
F2(Aexp*,Aexp*,...,Aexp*)
 = (Aexp* - {a+k}) \cup {i*j, i*j-1}
 = {i*j, i*j-1, j+1}
F3(Aexp*,Aexp*,...,Aexp*)
 = Aexp* \cap Aexp*
 = Aexp*
F4(Aexp*,Aexp*,...,Aexp*)
 = (Aexp* - {}) \cup {i*j,i*j-1}
 = Aexp*
F5(Aexp*,Aexp*,...,Aexp*)
 = Aexp*
F6(Aexp*,Aexp*,...,Aexp*)
 = (Aexp* - {}) \cup {a+k}
 = Aexp*
F7(Aexp*,Aexp*,...,Aexp*)
 = Aexp*
F8(Aexp*,Aexp*,...,Aexp*)
 = (Aexp* - {i*j,i*j-1,j+1}) \cup {}
 = {a+k}
F9(Aexp*,Aexp*,...,Aexp*)
 = Aexp*
F10(Aexp*,Aexp*,...,Aexp*)
 = (Aexp* - {a+k}) \cup {i*j, i*j-1}
 = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1}
 = {i*j, i*j-1, j+1}
------------------------------------------
   Recall that Aexp* is {i*j, i*j-1, a+k, j+1}

------------------------------------------
       SECOND ITERATION

Let \vec{v_1} // output of first iteration
  = ({}, {i*j, i*j-1, j+1}, Aexp*, Aexp*,
     Aexp*, Aexp*, Aexp*, {a+k}, Aexp*,
     {i*j, i*j-1, j+1})

The second iteration computes:
F1(\vec{v_1})
 = {}
F2(\vec{v_1})
 = ({} - {a+k}) \cup {i*j, i*j-1}
 = {i*j, i*j-1}
F3(\vec{v_1})
 = {i*j, i*j-1, j+1}
   \cap {i*j, i*j-1, j+1}
 = {i*j, i*j-1, j+1}
F4(\vec{v_1})
 = (Aexp* - {}) \cup {i*j,i*j-1}
 = Aexp*
F5(\vec{v_1})
 = Aexp*
F6(\vec{v_1})
 = (Aexp* - {}) \cup {a+k}
 = Aexp*
F7(\vec{v_1})
 = Aexp*
F8(\vec{v_1})
 = (Aexp* - {i*j,i*j-1,j+1}) \cup {}
 = {a+k}
F9(\vec{v_1})
 = {a+k}
F10(\vec{v_1})
 = (Aexp* - {a+k}) \cup {i*j, i*j-1}
 = ({i*j, i*j-1, j+1}) \cup {i*j, i*j-1}
 = {i*j, i*j-1, j+1}
------------------------------------------
   Q: Compared to the inputs to the second iteration, did any outputs change?
       Yes, the 2nd, 3rd, and 9th elements changed
         (from Aexp* to something smaller)
         
------------------------------------------
         THIRD ITERATION

Let \vec{v_2} // output of 2nd iteration
  = ({}, {i*j,i*j-1}, {i*j,i*j-1,j+1},
     Aexp*, Aexp*, Aexp*, Aexp*, {a+k},
     {a+k}, {i*j, i*j-1, j+1})

The third iteration computes:
F1(\vec{v_2})
 = {}
F2(\vec{v_2})
 = ({} - {a+k}) \cup {i*j, i*j-1}
 = {i*j, i*j-1}
F3(\vec{v_2})
 = {i*j, i*j-1} \cap {i*j, i*j-1, j+1}
 = {i*j, i*j-1}
F4(\vec{v_2})
 = ({i*j, i*j-1, j+1} - {})
   \cup {i*j,i*j-1}
 = {i*j, i*j-1, j+1}
F5(\vec{v_2})
 = Aexp*
F6(\vec{v_2})
 = (Aexp* - {}) \cup {a+k}
 = Aexp*
F7(\vec{v_2})
 = Aexp*
F8(\vec{v_2})
 = (Aexp* - {i*j,i*j-1,j+1}) \cup {}
 = {a+k}
F9(\vec{v_2})
 = {a+k}
F10(\vec{v_2})
 = ({a+k} - {a+k}) \cup {i*j, i*j-1}
 = {i*j, i*j-1}
------------------------------------------
    Q: What changed in the third iteration?
       The 3rd, 4th, 9th, and 10th places

    Q: Is this a lot of bookkeeping?
       yes!
    Q: Is there a more efficient way to do this?
       Yes, use the chaotic iteration algorithm...

------------------------------------------
        CHAOTIC ITERATION ALGORITHM (1.7)

Goal: compute the desired fixed point
      of dataflow equations (quickly)

Idea:
  - use a functional (F =(Fi), as before)
    to make the current approximation
    more and more precise
  - initialize the argument vector
    to be either:
       - a vector of empty sets, or
       - a vector of complete sets
    depending on if want to find the
       - least fixed point, or
       - greatest fixed point

For our example in the AE analysis,
   we want greatest fixed point, so:

Algorithm (Table 1.5):

Step 1: initialize
          AEi := Aexp* for i in {1,...,10}

Step 2:
       while \exists j such that
               AEj \neq Fj(AE1, ..., AE10)
       do
           let j be such that
               AEj \neq Fj(AE1, ..., AE10)
           in
               AEj := Fj(AE1, ..., AE10)

Step 3:
       return (AE1, ..., AE10)
------------------------------------------
      Q: Why is this algorithm correct?

         Because it simulates the iteration.
           In detail, we can always construct a step of the iteration
           of F from one or more steps of the Chaotic Iteration algorithm.
           Suppose v \neq F(v) = o.
           Then there is is some non-empty set of indexes, D, of o
            such that for all j in D, 
                v[j] \neq o[j] = Fj(v).
           The Chaotic Iteration algorithm can compute each component o[j]
            which can then be used to form o = F(v).

         The algorithm finds a fixed point, because
         when there is no j such that v \neq Fj(v),
              then by definition v = F(v).

         The algorithm terminates because the lattice
            (Aexp*^10, \sqsubseteq) is finite.
            
------------------------------------------
  RECALL THE FUNCTIONAL FOR THE EXAMPLE

Recall that F(v) = (F1(v),...,F10(v))
where

F1(AE1,AE2,...,AE10)
 = {}
F2(AE1,AE2,...,AE10)
 = (AE1 - killAE([k:= i*j-1]^1))
   \cup genAE([k:= i*j-1]^1)
 = (AE1 - {a+k}) \cup {i*j, i*j-1}
F3(AE1,AE2,...,AE10)
 = AE2 \cap AE10
F4(AE1,AE2,...,AE10)
 = (AE3 - killAE([i*j-1<n]^2))
   \cup genAE([i*j-1<n]^2))
 = (AE3 - {}) \cup {i*j,i*j-1}
 = AE3 \cup {i*j,i*j-1}
F5(AE1,AE2,...,AE10)
 = AE4
F6(AE1,AE2,...,AE10)
 = (AE5 - killAE([t:=a+k]^3))
   \cup genAE([t:=a+k]^3)
 = (AE5 - {}) \cup {a+k}
 = AE5 \cup {a+k}
F7(AE1,AE2,...,AE10)
 = AE6
F8(AE1,AE2,...,AE10)
 = (AE7 - killAE([j := j+1]^4))
   \cup genAE([j := j+1]^4)
 = (AE7 - {i*j,i*j-1,j+1}) \cup {}
F9(AE1,AE2,...,AE10)
 = AE8
F10(AE1,AE2,...,AE10)
 = (AE9 - killAE([k:=i*j-1]^5))
   \cup genAE([k:=i*j-1]^5))
 = (AE9 - {a+k}) \cup {i*j, i*j-1}
------------------------------------------
   Recall that Aexp* is {i*j, i*j-1, a+k, j+1}

------------------------------------------
     USING CHAOTIC ITERATION
     
Initialize:
     AEi := Aexp* for i in {1,...,10}

(Possible) Steps
 AE1 \neq
   F1(AE1,...,AE10)
   = {}
 so AE1 := {}

(Now AE1 is different than initially)

 AE2 \neq
   F2(AE1,...,AE10)
   = (AE1 - {a+k}) \cup {i*j, i*j-1}
   = ({} - {a+k}) \cup {i*j, i*j-1}
   = {i*j, i*j-1}
 so AE2 := {i*j, i*j-1}

 AE3 \neq
   F3(AE1,...,AE10)
   = AE2 \cap AE10
   = {i*j, i*j-1} \cap Aexp*
   = {i*j, i*j-1}
 so AE3 := {i*j, i*j-1}

 AE4 \neq
   F4(AE1,...,AE10)
   = AE3 \cup {i*j,i*j-1}
   = {i*j, i*j-1} \cup {i*j,i*j-1}
   = \cup {i*j,i*j-1}
 so AE4 := {i*j,i*j-1}

 AE5 \neq
   F5(AE1,...,AE10)
   = AE4
   = {i*j,i*j-1}
 so AE5 := {i*j,i*j-1}

 AE6 \neq
   F6(AE1,...,AE10)
   = AE5 \cup {a+k}
   = {i*j,i*j-1} \cup {a+k}
   = {i*j,i*j-1,a+k}
 so AE6 := {i*j,i*j-1,a+k}

 AE7 \neq
   F7(AE1,...,AE10)
   = AE6
   = {i*j,i*j-1,a+k}
 so AE7 := {i*j,i*j-1,a+k}

 AE8 \neq
   F8(AE1,...,AE10)
   = (AE7 - {i*j,i*j-1,j+1}) \cup {}
   = ({i*j,i*j-1,a+k} - {i*j,i*j-1,j+1})
     \cup {}
   = {a+k}
 so AE8 := {a+k}

 AE9 \neq
   F9(AE1,...,AE10)
   = AE8
   = {a+k}
 so AE9 := {a+k}

 AE10 \neq
   F10(AE1,...,AE10)
   = (AE9 - {a+k}) \cup {i*j, i*j-1}
   = ({a+k} - {a+k}) \cup {i*j, i*j-1}
   = {i*j, i*j-1}
 so AE10 := {i*j, i*j-1}
------------------------------------------
   Q: Could this algorithm take any more steps?
     No, so the answer is given by the final values of AE1,...,AE10
     (but it's worth checking F3 (AE3) again, because an input changed)

   Notice that we can work forward
     (because this is a forward analysis),
   and the computation only takes 10 steps!

**** Summary
    This has several key idea about data flow analysis:
      - only need to define what happens to each elementary block,
      - direction (forward or backward), and
      - if it's a "may" or "must" analysis (use union or intersection)
          and if we want the largest or smallest fixed point (solution)

*** Reaching Definitions Analysis (2.1.2)

------------------------------------------
REACHING DEFINITIONS (RD) ANALYSIS (2.1.2)

def (p. 5): a *definition of x*,
    where x in Var is


def (p. 5): a definition of x at label l
     *may reach* label l' iff there is
     an execution of the program where


Analysis question (p. 41):

"For each program point,
 which assignments may have been made
 and not overwritten,
 when program execution reaches that point
 along some path?"


------------------------------------------
    Q: What does "definition" mean for this analysis?
    ... a statement that assigns a value to x

    Q: What does it mean for a definition to reach a program point?
    ... the last value assigned to x,
         when control flow reaches the entry to l',
         was assigned at label l

    Q: Why would a compiler writer care about reaching definitions?
       Can warn programmer if no definition reaches a point

**** observations
    Q: What would be the bad outcome, which should be avoided?
         saying that no definition reaches a given point, when it could
           i.e., warning user about error that might not occur

    Q: What makes a "solution" unsafe?
       too little in the set of definitions
    Q: What makes it imprecise?
       too much in the set of definitions
   Q: So what solution do we want?
      the smallest set that is a solution (as it is the most precise)

    Q: This is a forward analysis, Why?
       because the computation happens forward, along paths
         so use flow, not flow^R

   Q: Note the use of the word "may" in the analysis statement,
      what impact does that have on the analysis?
       We use:
          - unions of entry information,
          - take the least fixed point (solution),
             since that is more precise

   Q: What sets would we use for the RD analysis?
        sets of Var* x Lab* pairs

------------------------------------------
     FORMAL DEFINITIIONS FOR RD ANALYSIS

RDentry(l) =
  if l = init(S*) then {(v,?) | v in Var*}
  else 


RDexit(l) =
(RDentry(l) - killRD(B^l)) \cup genRD(B^l)
    where B^l \in blocks(S*)


killRD: Blocks* -> Powerset(Var* x Lab^?*}

killRD([x:= a]^l) =

killRD([skip]^l) = {}
killRD([b]^l) = {}

genRD: Blocks* -> Powerset(Var* x Lab^?*}

genRD([x:= a]^l) =

genRD([skip]^l) = {}
genRD([b]^l) = {}
------------------------------------------
    ... RDentry(l) =
          if l = init(S*) then {(v,?) | v in Var*}
          else \bigcup {RDexit(l') | (l',l) \in flow(S*)}

    ... killRD([x := a]^l
          = {(x,l') | l' in Lab*?} = {x} x Lab*?

    ... genRD([x := a]^l)
          = {(x,l)}

**** example revisited
------------------------------------------
        EXAMPLE

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do ([t := a+k]^3;
       [j := j+1]^4;
       [k := i*j-1]^5)
------------------------------------------

    Q: What is Var* for this example?
        {k, i, j, t, a}
    Q: What is Lab*? for this example?
        {?, 1, 2, 3, 4, 5}

------------------------------------------
           KILL AND GEN FOR THE EXAMPLE

What are killRD and genRD in the example?

l  killRD(l)                genRD(l)
====================================
1
2
3
4
5
------------------------------------------

   ...
       l  killRD(l)            genRD(l)
       ================================
       1  {(k,l)|l in Lab*?}   {(k,1)}
       2  {}                   {}
       3  {(t,l)|l in Lab*?}   {(t,3)}
       4  {(j,l)|l in Lab*?}   {(j,4)}
       5  {(k,l)|l in Lab*?}   {(k,5)}

------------------------------------------
        EXAMPLE'S RD EQUATIONS

Recall that Var* is {a,i,j,k,n,t}

RDentry(1) =

RDexit(1)
 = 

RDentry(2) =

RDexit(2)
 =

RDentry(3) =

RDexit(3)
 =

RDentry(4) =

RDexit(4)
 =

RDentry(5) =

RDexit(5)
 = 


------------------------------------------
   ...
RDentry(1)
 = Var* x {?}
 = {(a,?), (i,?), (j,?), (k,?), (n,?), (t,?)}
RDexit(1)
 = (RDentry(1) - killRD([k := i*j-1]^1))
   \cup genRD([k := i*j-1]^1)
 = (RDentry(1) - ({k} x Lab*?))
   \cup {(k,1)}

RDentry(2) = RDexit(1) \cup RDexit(5)
RDexit(2) = RDentry(2)

RDentry(3) = RDexit(2)
Dexit(3) = (RDentry(3) - killRD([t := a+k]^3))
           \cup genRD([t := a+k]^3)

RDentry(4) = RDexit(3)
RDexit(4) = (RDentry(4) - killRD([j := j+1]^4))
            \cup genRD([j := j+1]^4)

RDentry(5) = RDexit(4)
RDexit(5) = (RDentry(5) - killRD(5)) \cup genRD(5)

    Q: Are larger or smaller sets more precise?
       smaller sets!
    Q: What is the bad outcome to avoid?
       A set that is too small, so warn user wrongly...

    Q: So what sets do we start with to find a solution?
       {}, because this is a may (union) analysis
        and we want the least (most precise) solution

    Q:  So what would the solution be?

------------------------------------------
     SOLUTION USING CHAOTIC ITERATION

let RD1 = {}, ..., RD10 = {};

RD1 \neq F1(RD1,...,RD10)  // entry(1)
   = {(a,?),(i,?),(j,?),(k,?),(n,?),(t,?)}
so RD1 := {(a,?),(i,?),(j,?),(k,?),
           (n,?),(t,?)}

RD2 \neq F2(RD1,...,RD10)  // exit(1)
 = (RD1 - killRD([k := i*j-1]^1))
   \cup genRD([k := i*j-1]^1)
 = ({(a,?),(i,?),(j,?),(k,?),
           (n,?),(t,?)}) - ({k} x Lab*?))
   \cup {(k,1)}
 = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)}
so RD2 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)}

RD3 \neq F3(RD1,...,RD10)  // entry(2)
  = RD2 \cup RD10
  = {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)} \cup {}
so RD3 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)}

RD4 \neq F4(RD1,...,RD10)  // exit(2)
  = RD3
  = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)}
so RD4 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)}

RD5 \neq F5(RD1,...,RD10)  // entry(3)
  = RD4
  = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)}
so RD5 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)}

RD6 \neq F6(RD1,...,RD10)  // exit(3)
  = (RD5 - killRD([t := a+k]^3))
    \cup genRD([t := a+k]^3)
  = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,?)}
     - ({t} x Lab*?)) \cup {(t,3)}
  = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)}
so RD6 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,3)}

RD7 \neq F7(RD1,...,RD10)  // entry(4)
  = RD5
  = {(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)}
so RD7 := {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,3)}

RD8 \neq F8(RD1,...,RD10)  // exit(4)
  = (RD7 - killRD([j := j+1]^4))
    \cup genRD([j := j+1]^4)
  = ({(a,?),(i,?),(j,?),(k,1),(n,?),(t,3)}
     - ({j} x Lab*?)) \cup {(j,4)}
  = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)}
so RD8 := {(a,?),(i,?),(j,4),(k,1),
           (n,?),(t,3)}

RD9 \neq F9(RD1,...,RD10)  // entry(5)
  = RD8
  = {(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)}
so RD9 := {(a,?),(i,?),(j,4),(k,1),
           (n,?),(t,3)}

R10 \neq F10(RD1,...,RD10)  // exit(5)
  = (RD9 - killRD([k := i*j-1]^5))
    \cup gen([k := i*j-1]^5)
  = ({(a,?),(i,?),(j,4),(k,1),(n,?),(t,3)}
     - ({k} x Lab*?)) \cup {(k,5)}
  = {(a,?),(i,?),(j,4),(k,5),(n,?),(t,3)}
------------------------------------------
    Q: Are there more steps to take?
       Yes, RD3 will change based on the new value of RD10...

------------------------------------------
       CHAOTIC ITERATION (CONTINUED)

RD3 \neq F3(RD1,...,RD10)  // entry(2)
  = RD2 \cup RD10
  = {(a,?),(i,?),(j,?),(k,1),
           (n,?),(t,?)}
    \cup {(a,?),(i,?),(j,4),(k,5),
          (n,?),(t,3)}
so RD3 := {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,?),(t,3)}

RD4 \neq F4(RD1,...,RD10)  // exit(2)
  = RD3
  = {(a,?),(i,?),(j,?),(j,4),(k,1),
     (k,5)(n,?),(t,?),(t,3)}
so RD4 := {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,?),(t,3)}

RD5 \neq F5(RD1,...,RD10)  // entry(3)
  = RD4
  = {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,?),(t,3)}
so RD5 := {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,?),(t,3)}

RD6 \neq F6(RD1,...,RD10)  // exit(3)
  = (RD5 - killRD([t := a+k]^3))
    \cup genRD([t := a+k]^3)
  = ({(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,?),(t,3)}
     - ({t} x Lab*?)) \cup {(t,3)}
  = {(a,?),(i,?),(j,?),(j,4),(k,1),(k,5),
     (n,?),(t,3)}
so RD6 := {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,3)}

RD7 \neq F7(RD1,...,RD10)  // entry(4)
  = RD6
  = {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,3)}
so RD7 := {(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,3)}

RD8 \neq F8(RD1,...,RD10)  // exit(4)
  = (RD7 - killRD([j := j+1]^4))
    \cup genRD([j := j+1]^4)
  = ({(a,?),(i,?),(j,?),(j,4),(k,1),
           (k,5),(n,?),(t,3)}
     - ({j} x Lab*?)) \cup {(j,4)}
  = {(a,?),(i,?),(j,4),(k,1),(k,5),
      (n,?),(t,3)}
so RD8 := {(a,?),(i,?),(j,4),(k,1),(k,5),
      (n,?),(t,3)}

RD9 \neq F9(RD1,...,RD10)  // entry(5)
  = RD8
  = {(a,?),(i,?),(j,4),(k,1),(k,5),
      (n,?),(t,3)}
so RD9 := {(a,?),(i,?),(j,4),(k,1),(k,5),
      (n,?),(t,3)}
------------------------------------------
        Q: Now is it finished?
           Yes, RD10 doesn't change
           (since the k pairs are subtracted first; check that!),
           so RD2 doesn't change... and nothing else does

        Q: So what is the fixed point?
           read it out from the last values
           of each RDj in the chaotic iteration...

*** Very Busy Expressions Analysis (2.1.3)

**** idea and goals
    This can be useful in hoisting expressions:
    i.e., evaluating an expression
          and storing it for later use (e.g., in a register)
    Doing this can save space in the generated code
         (making it use the cache more efficiently)

------------------------------------------
   VERY BUSY (VB) EXPRESSIONS ANALYSIS

def (p. 44): A non-trivial expression e
     is *very busy*
     at exit from a block labeled l if,
     e must always be used before any
     x \in FV(e) is assigned.

Examples:
At what points is a+b very busy in:

  [x := a+b]^1;
  [y := a+b]^2

  if [a-b > a+b]^3
  then [x := a+b]^4 else [y := a+b]^5

  [q := r]^7;
  [z := a+b]^8;

  if [a>b]^9
  then [x := a+b]^10 else [y := a+b]^11

  if [a>b]^12
  then [x := a+b]^13 else [y := 5021]^14

------------------------------------------

        ... exits of 1, 3, 7, 9
            (not 12 or any of the others)

      Note: "an expression is very busy at exit from a block if it is very
             busy at the entry to every block that follows."

       "However, no expressions are very busy
        at the exit from any final block"


------------------------------------------
      VERY BUSY EXPRESSIONS ANALYSIS

Analysis question (p. 44):

 "For each program point,
  which [non-trivial] expressions
  must be very busy
  at the exit from that point."

------------------------------------------

  Q: When should expressions be very busy at the entry to a block?
       The (non-trivial) expressions that are used in it.

  Q: Would it make sense to have the analysis go forwards or backwards?
     Backwards: want to accumulate information about uses and pass
        that backwards along control flows to where expressions are computed.

  Q: Is this a "may" or a "must" analysis? What does that mean?
     It's a must analysis,
     because of the "always" in the definition of "very busy,"
     so we intersect information when control flows join

  Q: Where do control flows join in this analysis?
     At exits, since we are working backwards, this would be labels
      where in a forward analysis the analysis would split

**** formal definition
------------------------------------------
         FORMAL DEFINITION

VBexit(l) =
  if l \in final(S*) then {}
  else 


VBentry(l) =
 (VBexit(l) \ killVB(B^l)) \cup genVB(B^l)
   where B^l \in blocks(S*)


killVB: Blocks* -> Powerset(Aexp*)

killVB([x:= a]^l) =

killVB([skip]^l) = {}
killVB([b]^l) = {}

genVB: Blocks* -> Powerset(Aexp*)

genVB([x:= a]^l) = Aexp(a)
genVB([skip]^l) = {}
genVB([b]^l) = Aexp(b)

------------------------------------------
   ... VBexit(l) =
        if l \in final(S*) then {}
        else \bigcap { VBentry(l') |
                       (l', l) \in flow^R(S*) }
   ... VBentry(l) =
        (VBexit(l) - killVB(B^l)) \cup genVB(B^l)
        where B^l \in blocks(S*)

   ... killVB([x:= a]^l) =
        {a' \in Aexp* | x \in FV(a')}

   ... genVB([x:= a]^l) = Aexp(a)

   ... genVB([b]^l) = Aexp(b)

   Q: Have we seen these kill and gen functions before?
      Yes, for the AE analysis!

   Q: Does this analysis need isolated exits? If so, why?
        Yes, because of the initial condition for VBexit(l)

   Q: Do we want the largest or the smallest solution?
      largest, because smaller is imprecise

**** example
------------------------------------------
              EXAMPLE

  if [a-b > a+b]^1
  then [x := a+b]^2
  else [y := a+b]^3;
  [z := a]^4

Aexp* is {

l  killVB(l)   genVB(l)
=======================
1
2
3
4

VBentry(1) =
VBexit(1) =
VBentry(2) =
VBexit(2) =
VBentry(3) =
VBexit(3) =
VBentry(4) =
VBexit(4) =
------------------------------------------
      Q: What would change if the condition at label 1 did not use a+b?
         The expression a+b would still be very busy at (exit from) 2 and 3

... Aexp* is {a-b, a+b}

...
    l  killVB(l)   genVB(l)
    =======================
    1  {}          {a-b,a+b}
    2  {}          {a+b}
    3  {}          {a+b}
    4  {}          {}

------------------------------------------
  FUNCTIONAL REPRESENTING THE EQUATIONS

Recall Aexp* = {a+b}
F(v) = (F1(v), F2(v), ..., F8(v))

F1(VB1,...,VB8)       // entry(1)
 = (VB2 - killVB([a-b > a+b]^1)
   \cup gen([a-b > a+b]^1)
 = (VB2 - {}) \cup {a-b,a+b}
 = VB2 \cup {a-b,a+b}
F2(VB1,...,VB8)       // exit(1)
 = VB3 \cap VB5
 
F3(VB1,...,VB8)       // entry(2)
 = (VB4 - killVB([x:=a+b]^2))
   \cup genVB[x:=a+b]^2))
 = (VB4 - {}) \cup {a+b}
 = VB4 \cup {a+b}
F4(VB1,...,VB8)       // exit(2)
 = VB7

F5(VB1,...,VB8)       // entry(3)
 = (VB6 - killVB([y:=a+b]^3))
   \cup genVB[y:=a+b]^3))
 = (VB6 - {}) \cup {a+b}
 = VB6 \cup {a+b}
F6(VB1,...,VB8)       // exit(3)
 = VB7

F7(VB1,...,VB8)       // entry(4)
 = (VB8 - killVB([z:=a]^4))
   \cup genVB([z:=a]^4)
 = (VB8 - {}) \cup {}
 = VB8
F8(VB1,...,VB8) = {}  // exit(4)
------------------------------------------

    Q: Are larger or smaller sets more precise?
       larger sets
    Q: What is the bad outcome to avoid?
       A set that is too large, so optimize away something not very busy

    Q: So what sets do we start with to find a solution?
       Aexp*, because this is a must (intersection) analysis
        and we want the greatest (most precise) solution


------------------------------------------
     SOLVING USING CHAOTIC ITERATION

Recall Aexp* = {a+b}

Let VB1 = Aexp*, ..., VB8 = Aexp*

VB8 \neq F8(VB1,...,VB8)     // exit(4)
  = {}
so VB8 := {}

VB7 \neq F7(VB1,...,VB8)     // entry(4)
  = VB8
  = {}
so VB7 := {}

VB6 \neq F6(VB1,...,VB8)     // exit(3)
  = VB7
  = {}
so VB6 := {}

VB5 \neq F5(VB1,...,VB8)     // entry(3)
  = VB6 \cup {a+b}
  = {} \cup {a+b}
  = {a+b}
so VB5 := {a+b}

VB4 \neq F4(VB1,...,VB8)     // exit(2)
  = VB7
  = {}
so VB4 := {}

VB3 \neq F3(VB1,...,VB8)     // entry(2)
  = VB4 \cup {a+b}
  = {} \cup {a+b}
  = {a+b}
so VB3 := {a+b}

VB2 \neq F2(VB1,...,VB8)     // exit(1)
  = VB3 \cap VB5
  = {a+b} \cap {a+b}
  = {a+b}
so VB2 := {a+b}

------------------------------------------
   Q: Now, does VB1 change?
      No, it's
        VB2 \cup{a-b,a+b}
        = {a+b} \cup {a-b,a+b}
        = {a-b,a+b}
        = Aexp*, so it's the same!
   Q: Can anything else change?
      no, so we have computed the fixed point of this example.

*** Live Variables Analysis (2.1.4)

**** idea and goals
------------------------------------------
           LIVE VARIABLES

def (p. 47): A variable x is *live*
     at exit from label l
     iff there is a path from l
     to a possible use of x
     (in an expression).

     This assumes that no variables are
     live at the end of the program.
     
Which variables are live at exit from 1?

(A)
   [x := 3]^1;
   if [z > 0]^2
   then [y := x+2]^3
   else [q := q+1]^4

(B)
   [x := 3]^1;
   [y := x+2]^2;
   [y := y+1]^3

(C)
   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3
   while [z > 0]^4
   do ([y := x+2]^5;
       [z := z-1]^6)

------------------------------------------
      {x} in (A), since q is not used by 1
      {} in (B), since neither x nor y will be used in 1
      {} in (C), while x is live at exit from 3

      The live variables at exit of l
       are the variables that may be used in expressions after l

------------------------------------------
       LIVE VARIABLES (LV) ANALYSIS

Analysis question (p. 47):
 "For each program point,
  which variables may be live
  at the exit from the point."

Example:

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

------------------------------------------
   Q: Which variables are live at exit of label 1? Label 2?
      {} and {z}

   The authors take the view "and that no variables are live at the
   end of the program" however they remark that "for some
   applications it might be better to assume that all variables are
   live at the end of the program."

   Q:  What can we use this for?
   dead code elimination:
     if a variable is not live at exit from a block that assigns to it,
      then the assignment statement is "dead"
        and can be deleted from the program.

   Q: What is the bad outcome to avoid?
      deleting some assignment statement that might be needed.

   Q: What kinds of analysis information should we track in the analysis?
      sets of variable names (Powerset(Var*)) that are live

   Q: Should this analysis go backwards or forwards?
      Backwards, so use information can propagate back to earlier blocks

**** definitions and formalization

------------------------------------------
         FORMAL DEFINITION

LVexit(l) =
  if l = final(S*) then {}
  else

LVentry(l) =


kill: Blocks* -> Powerset(Var*)

killLV([x:= a]^l) = 
killLV([skip]^l) = {}
killLV([b]^l) = {}

genLV: Blocks* -> Powerset(Var*)

genLV([x:= a]^l) = 
genLV([skip]^l) = {}
genLV([b]^l) =
------------------------------------------

  ... LVexit(l) =
       if l in finals(S*) then {}
       else \bigcup { LVentry(l') |
                       (l', l) \in flow^R(S*) }
  ... LVentry(l) =
       (LVexit(l) - killLV(B^l)) \cup genLV(B^l)
       where B^l \in blocks(S*)

  ... killLV([x:= a]^l) = {x}

  ... genLV([x:= a]^l) = FV(a)

  ... genLV([b]^l) = FV(b)

   Q: Does this analysis need isolated exits? If so, why?
        Because of the initial condition on final(S*)

   Q:  Why is there a union for LVexit?
        Because this is a may analysis

------------------------------------------
            EXAMPLE

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

l  killLV(l)    genLV(1)
==========================
1
2
3

LVentry(1) =
LVexit(1) =
LVentry(2) =
LVexit(2) =
LVentry(3) =
LVexit(3) =
------------------------------------------

...
    l  killLV(l)    genLV(1)
    ==========================
    1  {x}          {}
    2  {z}          {}
    3  {x}          {z}

------------------------------------------
         THE FUNCTIONAL

F1(LV1,...,LV6)        // entry(1)
 = (LV2 - killLV([x := 3]^1)
   \cup genLV([x := 3]^1)
 = (LV2 - {x}) \cup {}
 = LV2 - {x}
F2(LV1,...,LV6)        // exit(1)
  = LV3

F3(LV1,...,LV6)        // entry(2)
 = (LV4 - killLV([z := 4]^2)
   \cup genLV([z := 4]^3)
 = (LV4 - {z}) \cup {}
 = LV4 - {z}
F4(LV1,...,LV6)        // exit(2)
  = LV5

F5(LV1,...,LV6)        // entry(3)
 = (LV6 - killLV([x := z+2]^3)
   \cup genLV([x := z+2]^3)
 = (LV6 - {x}) \cup {z}
F6(LV1,...,LV6) = {}   // exit(3)
------------------------------------------

   Q: What makes the analysis information unsafe?
      a set that is too small
   Q: What makes the analysis imprecise?
      bigger sets

   Q: Do we want the largest or the smallest solution?
      smallest, as that will be the most precise solution

    Q: So, what would be the initial values for LV1,...,LV6?
      {}, the empty set, since we want the smallest solution
      as that is the most precise
      
*** Derived Data Flow Information (2.1.5)

------------------------------------------
  LINKING DEFINITIONS AND USES

Use-definition (ud) chain:
   links use of var (in an expression)
   to its last assignment

Definition-use (du) chain:
   links last assignment of var
   to a use (in an expression)
------------------------------------------

    Q:  What might this be useful for?
        - dead code elimination (if the set of du chains is empty)
        - code motion (reordering)
          (can reorder statements S1 and S2
           if there is no du or ud chain between S1 and S2)

**** formal definitions
------------------------------------------
           DEFINITIONS AND USES

def (p., 50):
  (l1, ..., ln) is a
  *definition clear path for x* iff
   1. no block labeled {l1, ..., l(n-1)}
      assigns a value to x, and
   2. the block labeled ln uses x
      (as an expression)

  clear(x, l, l') =
    (\exists l1, ..., ln ::
         l = l1 & l' = ln & n > 0
       & (\forall i : 1 <= i < n :
          (li, l(i+1)) \in flow(S*))
       & (\forall i : 1 <= i < n :
          not(def(x, li)))
       & use(x, ln))

  def(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in killLV([B]^l))

  use(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in genLV([B]^l))

------------------------------------------

     Q: How do you interpret the notion of a definition clear path for x?
     Q: Why are the def and use functions correct?
     Q: Does clear(y, 3, 7) tell you anything about the use of y?
        yes, it's used at label 7

------------------------------------------
           UD AND DU ANALYSIS

ud: Var* x Lab* -> Powerset(Lab*?)
ud(x, l') =
   {l | def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
  \cup
   {? | clear(x, init(S*), l')}

du: Var* x Lab*? -> Powerset(Lab*)
du(x, l) =
 if l != ?
 then
   {l'| def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
 else
   {l'| clear(x, init(S*), l')}

------------------------------------------
     Q: What does ud(x, l') = {l1, l2, l3} mean?
         that a use of x at the block labeled l'
          might have been defined at either l1, l2, or l3

     Q: What does du(x, l) = {l1, l2, l3} mean?
         that x is defined (assigned a value) at the block labeled l
          and that assignment to x might be used at either l1, l2, or l3

     Q: Can ud(x,l') be empty?
         no, but it could just be {?}
         What would that mean? that x is used uninitialized at l'

     Q: What would it mean if du(x,l) is empty?
         That the assignment statement at l is dead code

     Q: What is the analysis domain?
         It's sets -- not sequences -- of labels
         that are on definition-clear paths
         
     Q: Do these require isolated entries?
        yes

     Q: Are these must or may analyses?
        may

     Q: So what would a bad outcome be for a UD or DU analysis?
           a set that is too small
     Q: Would we want the largest or smallest solution for a UD or DU analysis?
           the smallest, as it would be more precise

     Q: Can we define du in terms of ud?
        yes (this is an exercise in the book)

**** example
------------------------------------------
        EXAMPLE

   [z := 3]^1;
   if [y > 0]^2
   then [y := z+2]^3
   else [y := y+1]^4

    ud(x, l)
l \ x |   y        z
========================
1
2
3
4

    du(x, l)
 l \ x |   y    z
=====================
   1
   2
   3
   4
------------------------------------------
 ...
        ud(x, l)
     l \ x |   y        z
    ========================
       1       {}       {}
       2       {?}      {}
       3       {}       {1}
       4       {?}      {}
    
        du(x, l)
     l \ x |    y       z
    ========================
     1          {}      {3}
     2          {}      {}
     3          {}      {}
     4          {}      {}
    

**** computation

    Q: How could we use RD and LV to compute ud chains?
    see p. 54

    UD : Var* x Lab* -> Powerset(Lab*)
    UD(x,l) = if x \in genLV(B^l) then {l' | (x,l') \in RDentry(l)}
              else {}

    computing DU is an exercise.