Com S 641 Lecture -*- Outline -*-

* Intraprocedural Analysis (2.1)

  Several examples of data flow analysis for the While language.

  Analyses defined by pairs of entry/exit functions,
   that map labels to analysis information (sets)

** definitions and notation

   these are all defined by structural induction

*** initial and final labels

------------------------------------------
       INITIAL LABEL

init: Stmt -> Lab

  init([x := a]^l) = l
  init([skip]^l) = l
  init(S1; S2) =
  init(if [b]^l then S1 else S2) = 
  init (while [b]^l do S) = 

------------------------------------------

        ... init(S1)
            l
            l

     Q:  What would the initial label of a nondeterministic choice
         statement be?  Of a parallel composition statement?

     Q:  How would we generalize the formalism to handle such statements?
         perhaps return a set of labels.

------------------------------------------
      FINAL LABELS

final: Stmt -> Powerset(Lab)

  final([x := a]^l) = {l}
  final([skip]^l) = {l}
  final(S1; S2) =
  final(if [b]^l then S1 else S2) = 
  final (while [b]^l do S) = 

------------------------------------------

        ... final(S2)
            final(S1) \cup final(S2)
            l

     Q:  What would the final label set of a nondeterministic choice
         statement be?  Of a parallel composition statement?

*** blocks and labels

------------------------------------------
      ELEMENTARY BLOCKS

blocks: Stmt -> Powerset(Block)

  
------------------------------------------
     Q:  How would you define the set of elementary blocks in a statement?

            blocks([x := a]^l) = {[x := a]^l}
            blocks([skip]^l) = {[skip]^l}
            blocks(S1; S2) = blocks(S1) \cup blocks(S2)
            blocks(if [b]^l then S1 else S2)
                  = {[b]^l} \cup blocks(S1) \cup blocks(S2)
            blocks(while [b]^l do S) = {[b]^l} \cup blocks(S)

     Q:  What would the set of blocks in an assert statement be?
     Q:  What would be set of blocks in a nondeterministic choice
          statement be?

     Q:  How would you define the set of labels in a statement?

         labels: Stmt -> Powerset(Lab)
         labels(S) = {l | [B]^l \in blocks(S)}

*** flows and reverse flows

------------------------------------------
      FLOWS

flows: Stmt -> Powerset(Lab x Lab)

  flows([x := a]^l) = {}
  flows([skip]^l) = {}
  flows(S1; S2) = 
                   
  flows(if [b]^l then S1 else S2) =
 
  flows(while [b]^l do S) =
------------------------------------------
        ... flows(S1) \cup flows(S2)
            \cup {(l, init(S2))| l \in final(S1)}

            flows(S1) \cup flows(S2)
            \cup {(l,init(S1)} \cup {(l,init(S2)}

            flows(S) \cup {(l, init(S))}
            \cup {(l', l)| l' \in final(S)}

   Q:  How would we use these functions to represent the nodes and
   edges of a dataflow graph?

   Q: What are labels and edges of 
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?             

   Q:  How you formulate a set of reverse flows?

       flow^R: Stmt -> Powerset(Lab xLab)
       flow^R(S) = {(l', l) | (l, l') \in flow(S)}

   Q: What is flow^R of
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?             

   Q: What are the initial nodes of flow^R?

*** program of interest

------------------------------------------
           PROGRAM OF INTEREST

 S  = the top level statement
  *

 Lab  = labels(S )
    *           *

 Var  = FV(S )
    *       *

 Blocks  = blocks(S )
       *           *

 AExp  = nontrivial arithmetic expressions
     *       in S
                 *
------------------------------------------

      Q:  What's a trivial expression?
          a single variable or a constant, in other words, a base case

------------------------------------------
       SUBEXPRESSIONS

 AExp(a) = non-trivial arithmetic
           subexpressions of a

 AExp(b) = non-trivial arithmetic
           subexpressions of b
------------------------------------------

------------------------------------------
      ISOLATED ENTRIES AND EXITS

def: S* has isolated entries iff


def: S* has isolated exits iff


------------------------------------------
        ... (\forall l \in Lab :: (l, init(S*)) \not\in flow(S*))

        ... (\forall l1 \in final(S*) ::
               (\forall l2 \in Lab ::
                   (l1, l2) \not\in flow(S*)))

   Q: What kind of programs would not have isolated entries?
      Those that do not start with a while-loop.

   Q:  What kind of programs would not have isolated exits?
       Those that do not end with an if-then-else statement or while....   

------------------------------------------
           LABEL CONSISTENT

def: S is label consistent if and only if
     no two blocks in S
     have the same label.

------------------------------------------

   Q:  How would you formalize that?

       [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2

   Q:  Is there any reason not to have label consistent programs?

** available expressions analysis (2.1.1)

*** idea, goal
------------------------------------------
      AVAILABLE EXPRESSIONS ANALYSIS

 "For each program point,
  which expressions must have already been
  computed, and not later modified,
  on all paths to the program point."

Example:

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do ([t := a+k]^3;
       [j := j+1]^4;
       [k := i*j-1]^5;)
------------------------------------------

    Q: What expressions are availabe at entry to block 2?

*** formalization

    The basic idea is to define functions on each elementary block,
    using two sub-functions: gen and kill...

------------------------------------------
        FORMAL DEFINITION

AEentry(l) = 
   if l = init(S*) then {}
   else \bigcap {AEexit(l')
                   | (l',l) \in flow(S*)}

AEexit(l) = 
 (AEentry(l) \ killAE(B^l)) \cup genAE(B^l)
    where B^l \in blocks(S*)

killAE: Blocks* -> Powerset(Aexp*)

killAE([x:= a]^l) =
      {a' \in Aexp* | x \in FV(a')}
killAE([skip]^l) = {}
killAE([b]^l) = {}

genAE: Blocks* -> Powerset(Aexp*)

genAE([x:= a]^l) =
      {a' \in Aexp(a) | x \not\in FV(a')}
genAE([skip]^l) = {}
genAE([b]^l) = AExp(b)
------------------------------------------

    Q: What's the role of the dataflow graph here?
       it's an implicit parameter,
       so these equations generate particular sets for every dataflow
       graph (program).

    Q: What does the kill function mean?
       It says what expressions are no longer available,
       since they were assigned (what to take out of the analysis).

    Q: What does the gen function mean?
       It says what expressions become available.

    Q: Why don't we have to define the analysis for while loops and if
       statements? 
       it's implicit in the DFG

    Q: What are we assuming with this formalism?
       That the program is label consistent.  Why?
            because of the use of blocks(S*)
       That the program has isolated entries.  Why?
            because of the use of l == init(S*)

*** observations
    This is a forward analysis, Why?
       because we don't use flow^R
       
    What makes a "solution" unsafe?
       too much in the set
    What makes it imprecise?
       too little in the set

    We want the largest (safe) sets, Why?
       because we want more information to use in optimizations

    Note the use of the word "must", what impact does that have on the
    analysis?
       intersections of entry information, makes us want the largest set

*** example revisited
------------------------------------------
        EXAMPLE

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do ([t := a+k]^3;
       [j := j+1]^4;
       [k := i*j-1]^5;)

What's the data flow graph?


------------------------------------------

------------------------------------------
           KILL AND GEN

What are killAE and genAE for this?

l  killAE(l)      genAE(l)
============================
1
2
3
4
5
------------------------------------------


------------------------------------------
     EXAMPLE EQUATIONS

AEentry(1) =
AEentry(2) =
AEentry(3) =
AEentry(4) =
AEentry(5) =

AEexit(1) =
AEexit(2) =
AEexit(3) =
AEexit(4) =
AEexit(5) =

------------------------------------------

    Q: So what sets do we start with to find a solution?
       AExp*

    Q:  So what would the solution be?

** Reaching Definitions Analysis (2.1.2)

   See the book

   Q: Is this a forward or backward analysis?
      forward, uses flow not flow^R
   Q: What makes the analysis imprecise?
      larger sets
   Q: So what solution do we want?
      the smallest set

   Q: Note the use of the word "may" in the analysis statement,
      what impact does that have on the analysis?
       unions of entry information, makes us want the smallest set


** Very Busy Expressions Analysis (2.1.3)

*** idea and goals
------------------------------------------
      VERY BUSY EXPRESSIONS

def: An expression e is *very busy*
     at exit from block l if,
     e must always be used before some
     x \in FV(e) is assigned.

At what points is a+b very busy in:

  [x := a+b]^1;
  [y := a+b]^2

  if [a-b > a+b]^3
  then [x := a+b]^4 else [y := a+b]^5

  [q := r]^7;
  [z := a+b]^8;

  if [a>b]^9
  then [x := a+b]^10 else [y := a+b]^11

  if [a>b]^12
  then [x := a+b]^13 else [y := 641]^14

------------------------------------------

        ... exits of 1, 3, 7, 9
            (not 12 or any of the others)

       "an expression is very busy at exit from a block if it is very
       busy at the entry to every block that follows."

       "However, no expressions are very busy at the exit from any
       final block"


------------------------------------------
      VERY BUSY EXPRESSIONS ANALYSIS

 "For each program point,
  which expressions must be very busy
  at the exit from the point."

------------------------------------------

  Q:  What could we use this for?
      Hoisting, which is to store the value of the expression for
      later use.

*** formal definition
------------------------------------------
         FORMAL DEFINITION

VBexit(l) = 
  if l \in final(S*) then {}
  else \bigcap { VBentry(l') |
                 (l', l) \in flow^R(S*) }
VBentry(l) =
 (VBexit(l) \ killVB(B^l)) \cup genVB(B^l)
   where B^l \in blocks(S*)

kill: Blocks* -> Powerset(Aexp*)

killVB([x:= a]^l) =
      {a' \in Aexp* | x \in FV(a')}
killVB([skip]^l) = {}
killVB([b]^l) = {}

genVB: Blocks* -> Powerset(Aexp*)

genVB([x:= a]^l) = Aexp(a)
genVB([skip]^l) = {}
genVB([b]^l) = AExp(b)

------------------------------------------

   Q:  Is this a forward or backward analysis?
       backwards, because we use flow^R.

   Q: Does this analysis need isolated exits?
      the authors say so, why?

   Q:  Why is there an intersection for VBexit?

   Q:  Do we want the largest or the smallest solution?
       largest, because smaller is imprecise
*** example
------------------------------------------
              EXAMPLE

  if [a-b > a+b]^1
  then [x := a+b]^2
  else [y := a+b]^3;
  [z := a]^4;

l  killVB(l)   genVB(l)
=======================
1
2
3
4

VBentry(1) =
VBexit(1) =
VBentry(2) =
VBexit(2) =
VBentry(3) =
VBexit(3) =
VBentry(4) =
VBexit(4) =
------------------------------------------

** Live Variables Analysis (2.1.4)

*** idea and goals
------------------------------------------
           LIVE VARIABLES

def: A variable x is *live*
     at exit from label l
     if there is a path from l to a use
     of x that does not redefine x.

Which variables are live at exit from 1?

   [x := 3]^1;
   if [z > 0]^2
   then [y := x+2]^3
   else [q := q+1]^4

   [x := 3]^1;
   [y := x+2]^2;
   [y := y+1]^3

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3
   while [z > 0]^4
   do ([y := x+2]^5;
       [z := z-1]^6)

------------------------------------------

------------------------------------------
         LIVE VARIABLES ANALYSIS

 "For each program point,
  which variables may be live
  at the exit from the point."

Example:

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

------------------------------------------

   The authors take the view "and that no variables are live at the
   end of the program" however they remark that "for some
   applications it might be better to assume that all variables are
   live at the end of the program."

   Q:  What can we use this for?
   dead code elimination: if the variable is not live at exit from a
   block that assigns to it, the assignment can be eliminated.

*** definitions and formalization

------------------------------------------
         FORMAL DEFINITION

LVexit(l) = 
  if l \in final(S*) then {}
  else \bigcup { LVentry(l') |
                 (l', l) \in flow^R(S*) }
LVentry(l) =
 (LVexit(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

kill: Blocks* -> Powerset(Var*)

killLV([x:= a]^l) = {x}
killLV([skip]^l) = {}
killLV([b]^l) = {}

genLV: Blocks* -> Powerset(Var*)

genLV([x:= a]^l) = FV(a)
genLV([skip]^l) = {}
genLV([b]^l) = FV(b)
------------------------------------------

   Q:  Is this a forward or backward analysis?
       backwards, because we use flow^R.

   Q: Does this analysis need isolated exits?
      the authors say so, why?

   Q:  Why is there a union for LVexit?

   Q:  Do we want the largest or the smallest solution?
       smallest, because larger is imprecise
       We want to know exactly what the dead variables are.

------------------------------------------
            EXAMPLE

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

l  killLV(l)    genLV(1)
==========================
1
2
3

LVentry(1) =
LVexit(1) =
LVentry(2) =
LVexit(2) =
LVentry(3) =
LVexit(3) =
------------------------------------------

** Derived Data Flow Information (2.1.5)

------------------------------------------
  LINKING DEFINITIONS AND USES

Use-definition (ud) chain:
   links use of var to its last assignment

Definition-use (du) chain:
   links last assignment of var to its use
------------------------------------------

    Q:  What might this be useful for?
    dead code elimination
    code motion (reordering)

*** formal definitions
------------------------------------------
           DEFINITIONS AND USES

definition clear path for x

  clear(x, l, l') =
    (\exists l1, ..., ln ::
         l = l1 & ln = l' & n > 0
       & (\forall i : 1 <= i < n :
          (li, li+1) \in flow(S*))
       & (\forall i : 1 <= i < n :
          not(def(x, li)))
       & use(x, ln))

  def(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in killLV([B]^l))

  use(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in genLV([B]^l))

------------------------------------------

     Q:  Why are the def and use functions correct?
     Q:  How do you interpret the notion of a clear path?

------------------------------------------
           UD and DU

ud: Var* x Lab* -> Powerset(Lab*)
ud(x, l') =
   {l | def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
  \cup
   {? | clear(x, init(S*), l')}

du: Var* x Lab* -> Powerset(Lab*)
du(x, l) =
 if l != ?
 then
   {l'| def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
 else
   {l'| clear(x, init(S*), l')}

------------------------------------------

     Q: Do these require isolated entries?
        yes

     Q: Are these must or may analsyes?
        may

     Q: Can we define du in terms of ud?

*** example
------------------------------------------
        EXAMPLE

   [z := 3]^1;
   if [y > 0]^2
   then [y := z+2]^3
   else [y := y+1]^4

    ud(x, l)

 l \ x |   y    z
===================== 
   1
   2
   3
   4

    du(x, l)

 l \ x |   y    z
===================== 
   1
   2
   3
   4
------------------------------------------

*** computation

    Q: How could we use RD and LV to compute ud chains?
    see p. 54