CIS 4615 meeting -*- Outline -*-

* overview of static analysis

** background
------------------------------------------
          STATIC VS. DYNAMIC

def: *analysis* is a process of discovering


def: *static analysis* is an analysis
     that is done


def: *dynamic analysis* is an analysis
     that is done


------------------------------------------
        ... properties of a program's behavior
            ideally properties that are true of all possible executions

            e.g., does the program have a buffer overflow vulnerability?
                  does the program have a format string vulnerability?
                  could the program use memory after it has freed it?

            Code audits or code reviews are doing analysis

        ... before (or without) running the program
            from the text of the program itself

            e.g., virus scanning
                  type checking
                  gcc's format string warnings (show that!)

        ... while the program is running

            e.g., seeing what files a program reads
                  seeing what network traffic a program generates
                  testing of any sort, including fuzz testing

        Q: If you do a code review, is that static or dynamic analysis?
        static
        Q: If you do testing, is that static or dynamic analysis?
        dynamic

** motivation
   We have done code reviews, in homework 3...
------------------------------------------
          WHY DO PROGRAM ANALYSIS?

Because code reviews/audits are:


------------------------------------------
   ... - expensive, 
         Why? because need expert programmers
       - time consuming
         Why? because reading code is hard (impossible, actually)
       - prone to missing vulnerabilities
         Why? because people get tired

      Q: What are some advantages of code reviews?
         They can be done without waiting for new theory and software tools
         They can leverage human creativity and insight

      But it may be cheaper and more rigorous to have a program do the analysis
*** static analysis

------------------------------------------
            STATIC ANALYSIS

Advantages:


Disadvantages:


------------------------------------------

   ... - can give guarantees about all program behaviors
       - for malware, can give warning *before* running program
       - techniques useful for compilers (optimization, critical systems)

   ... - will have false positives (says there is a problem when not)
         (think of virus scanning or type checking)
       - can be computationally expensive (O(n^3), where n is size of program)
         (there is a tradeoff between false positives and efficiency)

------------------------------------------
           PRECISION AND RECALL

def: the *precision* of an analysis is
     the fraction of


def: the *recall* of an analysis is
     the fraction of 


Example:
  Suppose a program has 10 vulnerabilities
  and a tool identifies 8 places
  but only 6 of those are actual ones

  The precision is

      recall is

------------------------------------------
        ... identified problems that are actually problems
        ... the actual problems that are found

        ... 6/8 
               (true positives/number output)
        ... 6/10 
              (true positives/possible outputs)

        Q: What is the goal for precision?  For recall?
        ideally would like both precision and recall to be 1

        Q: Can we do that?
        No, that is impossible in general for program analysis

        Q: Which is worse for analysis of security vulnerabilities:
           poor precision or poor recall?
        Poor recall is certainly bad, means that you miss vulnerabilities
        But poor precision is also bad, means that people won't pay attention

*** dynamic analysis
------------------------------------------
          DYNAMIC ANALYSIS

Advantages:


Disadvantages:


------------------------------------------
    ... - will work on obfuscated software
        - never has false positives (precision is 1)

    ... - puts the machine doing the analysis at risk
        - may have many false negatives (low recall, may miss a lot)
          (consider testing smartphone apps, if the app waits for 3 days, 
          then unlikely that testing will catch it)

    Q: Do we have to decide between static and dynamic analysis?
    No, we can use both

** capabilities
*** taint checking
------------------------------------------
           TAINT CHECKING

Attacks on integrity often involve 


Example Attacks?


------------------------------------------
     ... trusting user input

     ... SQL injection
         cross-site scripting
         format string
         command injection

------------------------------------------
            TAINT CHECKING

Desired guarantee:


What is trust?


How could a program know about trust?


------------------------------------------
    ... program does not trust user input

    ... for SQL injection: passing user input to SQL interpreter
        for XSS: sending user input to web page
        for format string: passing user input as a format string argument
        for command injection: passing user input to command interpreter

**** format string analysis
     Q: What would be a very simple possible static analysis 
        for format string vulnerabilities? 
     Make sure all format strings are given literally in the code
        like printf("%s\n", input);

     Q: What would be the problem with that?
     Low precision (i.e., too many false positives)

     How could we do better?
       Need to know when the format argument could have user inputs

------------------------------------------
 STATIC ANALYSIS FOR FORMAT STRING ATTACKS

Analysis information:


------------------------------------------
     ... The analysis has:
          1. a set of var-arg functions (printf, fprintf, ...)
          2. a set of user input functions
          3. track where the user-influenced information is
             fgets(linebuf, sizeof(linebuf), stdin);
             strncpy(buf, linebuf, sizeof(buf));
             val = sscanf(buf, "%d");

          Give errors when a var-arg function is called with user
          input

          Example tool: gcc with -Wformat=2 (show that)

          The key to precision is to not give errors when such a
          function is called with input that doesn't come from the
          user.
          This is called taint analysis...

**** SQL injection attacks
------------------------------------------
 STATIC ANALYSIS FOR SQL INJECTION ATTACKS

Analysis:


------------------------------------------

    ...   1. have a set of SQL primitives that can't be called with
             user input,
             What if some function calls them through another function?
               then that function should be flagged as a vulnerability
          2. have a set of user input functions (e.g., fgets, scanf, ...)
          3. track where the user-influenced information is

          Q: What's an error that the SQL injection analysis can find?
             Calling an SQL primitive with input (information) from the user

          Example tool: CheckLT, JIF

          Q: What could be done to improve precision?
          track sanitization routines or allow an annotation
          Could we do that for the format string analysis also? Yes!

**** XSS attacks
------------------------------------------
        STATIC ANALYSIS FOR XSS

Analysis for type 1 (reflected XSS):


------------------------------------------
    ... for XSS:
          1. have a set of primitives that output to the web page
          2. have a set of user input functions
          3. track where the user-influenced information is

          Q: What's an error that the XSS analysis can find?
          User input is output to a web page.

** limitations
    There are some fundamental (theoretical limitations to be aware of)
*** perfectly precise static analysis is impossible
**** background
------------------------------------------
         BACKGROUND: PROBLEMS

A *problem* is a specification of 
a procedure's desired relation


Examples:


------------------------------------------
    between inputs and outputs (possibly with some resource
    constraints)

    given two integers, return their sum

    given a set of temperature, pressure, and humidity readings today,
       predict the weather for tomorrow

------------------------------------------
    BACKGROUND: EFFECTIVE PROCEDURE

An effective procedure is
  "a set of rules which tell us, 
  from moment to moment, 
  precisely how to behave."
  -- M. Minsky, Computation: 
     Finite and Infinite Machines (p. 106)

Practical definition:


------------------------------------------
        ... a program (if we accept the Church-Turing thesis)

------------------------------------------
            EXAMPLE PROGRAM

In Java:

public class hailstone {
    public static long steps(long x) {
        if (x <= 1) {
            return 0;
        } else {
            return 1+steps(h(x));
        }
    }

    public static long h(long x) {
        if (odd(x)) {
            return (3*x+1) / 2;
        } else {
            return x/2;
        }
    }

    public static boolean odd(long x) {
        return x%2 == 1;
    }
}
------------------------------------------
        Q: What is steps(1)?
        0
        Q: What is steps(2)?
        1
        Q: What is steps(3)?
        5
        Q: What is steps(7)?
        11
        Q: What is steps(27)?
        70

        Q: Is it easy to predict the value of steps(x) for a given x?
        No, seems very difficult, this is an infamous mathematical problem.

------------------------------------------
        CAN PROGRAMS DO EVERYTHING?

program = effective procedure
          must always terminate

If some problems can't be solved 
   by programs, then
    set of problems 


------------------------------------------
        Q: Was the code for steps() a program?
        Maybe, but we don't know if it always terminates!

        ... is not equal to what programs can do!

        Q: Can you think of something that is impossible for a program
        to do?
        Make a perfectly precise analysis of some interesting
        property... such as solving the halting problem...

------------------------------------------
      WHAT DOES "IMPOSSIBLE" MEAN?

Mathematical meaning:
      inconsistent with accepted axioms

E.g., 
 - find fraction equal to square root of 2
 - solving the "halting problem"

What we DON'T mean:
 - very hard
 - socially unacceptable
 - illegal
 - against physical laws

------------------------------------------
        ... like whether steps() terminates
        ... like wearing something weird
        ... like murdering someone
        ... like traveling faster than light

**** the halting problem
------------------------------------------
       THE HALTING PROBLEM

Write a program, halts(P,I), 
   that takes code P, and input string I,
   and:
    halts(P, I) = true,
              if P halts on input I,
    halts(P, I) = false, 
             if P does not halt on input I,
    halts always returns true or false
      in a finite amount of time.


------------------------------------------
       This has to work for *all* programs and all inputs

       Q: Are we asking for precision or recall?
       100% precision

       Q: What do we mean by "halts"?
           finishes, stops running

       Q: Would this be useful?
       Sure!

       Q: How could you solve it? What would be an idea for an algorithm?

**** impossibility result
   The work of Alan Turing, around 1937.
   Alan Turing, On computable numbers, with an application to the
   Entscheidungsproblem, Proceedings of the London Mathematical
   Society, Series 2, Volume 42 (1937), pp 230--265,
   doi:10.1112/plms/s2-42.1.230.
   Alan Turing, On Computable Numbers, with an Application to the
   Entscheidungsproblem. A Correction, Proceedings of the London
   Mathematical Society, Series 2, Volume 43 (1938), pp 544--546,
   doi:10.1112/plms/s2-43.6.544 .
   
------------------------------------------
SOLVING THE HALTING PROBLEM IS IMPOSSIBLE!

Proof by contradiction:
 - Suppose program F solves 
    the halting problem.
 - Let T be the program:
    T(P) = if F(P,P) 
           then return loop() 
           else return false
           end

 - Consider T(T)
   By definition, 
    T must either loop forever 
           or return false.

   If it loops forever, 


   If it returns false,


------------------------------------------
        ... then F said T would halt on input T,
            but it didn't (since it is looping forever)!

        ... then F said T wouldn't halt on input T, but it did halt!

        So in both cases, F doesn't fulfill its specification,
        which contradicts its existence.

      Note it's important that we expect a solution (like F) to always
      halt and be correct
      It has to be 100% precise

**** other impossible problems
------------------------------------------
         OTHER IMPOSSIBLE PROBLEMS
         
Write AlwaysHalts(P)
   to decide if P halts on all inputs


Write HaltsOnEmpty(P)
   to decide if P halts on empty input


------------------------------------------
    Always halts is
      ... Unsolvable, since it includes the halting problem as a special case!

    HaltsOnEmpty is
      ... Unsolvable, since can construct from HaltsOnEmtpy
          a procedure F that would solve the halting problem!
	F(P,I) = define N(E) = return P(I); % ignore E and run P on I
                 return HaltsOnEmpty(N)

------------------------------------------
       RICE'S THEOREM

There is no program that can decide any
non-trivial property of programs.


------------------------------------------
       This follows from the Church-Turing thesis.
       Here non-trivial means that there is a program (Turing machine)
       that recognizes the langauage that characterizes the property, 
       and also that there exists one that recognizes a language 
       that does not have that property.
       
       Q: What does this mean for static analysis?
       It can't be done with 100% percision for anything we care about

**** What can we do?
       Q: So can programs help programmers?
       Yes, but we have to cheat a bit...

***** an analogy
------------------------------------------
      ANALOGY: DO I NEED AN UMBRELLA?

Website tells if you need an umbrella

What should it do?

   Need umbrella? |   rains  | no rain
   ===============|==========|==========
   "YES"          |          |
                  |          |
   "NO"           |          |


------------------------------------------
        There used to be a website, umbrellatoday.com

        Q: What are the possible outcomes?
            ...  good :-)       ok :-|
                 bad :-(        good :-)

        Q: What should the web site say if there is only a 50% chance
           of rain in the forecast?
        "YES"!  Customers shouldn't be wet = unhappy

        So safety (conservatism) says to err on the side of telling
        people to bring an umbrella
        Want to avoid them getting wet

***** virus checking
------------------------------------------
             VIRUS CHECKING

Program tells if another program has a virus

What should it do?

   Has virus?     | malicious | safe
   ===============|===========|==========
   "YES"          |           |
                  |           |
   "NO"           |           |


------------------------------------------

        Q: What are the possible outcomes?
                   good :-)      ok  ?
                   bad :-(       good :-)

        Q: So what should the tool do if it's not sure?
        better tell the user that it might have a virus,
        that is the safe thing to do

        However, will people tolerate that?
        It's unclear. 
        Traditionally there has to be be a very low rate of false positives

**** summary
------------------------------------------
    SUMMARY OF STATIC ANALYSIS LIMITS

Can't have 100% precise solutions
  to interesting analysis questions

But we can do something less precise

  extreme: "your program is wrong!"

  goal: get higher precision
        with 100% recall
------------------------------------------

*** prefect recall won't happen with dynamic analysis

------------------------------------------
          LIMITS OF TESTING

To have airtight assurance about safety
  need to

Problem: input space is practically


Consider testing this:

 #define BYTE_BITS 8
 int sign(long v) {
   /* return -1 if v <0, otherwise return +1 */
   return +1 | (v >> (sizeof(long) * BYTE_BITS - 1));
 }

"Testing can only
         reveal the presence of errors,
         not their absence"
      -- E. Dijkstra
------------------------------------------
        ... test all possible inputs

        ... infinite
            (so testing all inputs takes too long)

      The code is from Bit Twiddling Hacks By Sean Eron Anderson
      http://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign

      Q: Why does testing work in other areas of engineering?
      Because laws that govern how things work are continuous,
        so some kind of continuity argument can be used to extrapolate
        from a small number of test results.