I. overview of static analysis A. background ------------------------------------------ STATIC VS. DYNAMIC def: *analysis* is a process of discovering def: *static analysis* is an analysis that is done def: *dynamic analysis* is an analysis that is done ------------------------------------------ If you do a code review, is that static or dynamic analysis? If you do testing, is that static or dynamic analysis? B. motivation ------------------------------------------ WHY DO PROGRAM ANALYSIS? Because code reviews/audits are: ------------------------------------------ What are some advantages of code reviews? 1. static analysis ------------------------------------------ STATIC ANALYSIS Advantages: Disadvantages: ------------------------------------------ ------------------------------------------ PRECISION AND RECALL def: the *precision* of an analysis is the fraction of def: the *recall* of an analysis is the fraction of Example: Suppose a program has 10 vulnerabilities and a tool identifies 8 places but only 6 of those are actual ones The precision is recall is ------------------------------------------ What is the goal for precision? For recall? Can we do that? Which is worse for analysis of security vulnerabilities: poor precision or poor recall? 2. dynamic analysis ------------------------------------------ DYNAMIC ANALYSIS Advantages: Disadvantages: ------------------------------------------ Do we have to decide between static and dynamic analysis? C. capabilities 1. taint checking ------------------------------------------ TAINT CHECKING Attacks on integrity often involve Example Attacks? ------------------------------------------ ------------------------------------------ TAINT CHECKING Desired guarantee: What is trust? How could a program know about trust? ------------------------------------------ a. format string analysis What would be a very simple possible static analysis for format string vulnerabilities? What would be the problem with that? ------------------------------------------ STATIC ANALYSIS FOR FORMAT STRING ATTACKS Analysis information: ------------------------------------------ b. SQL injection attacks ------------------------------------------ STATIC ANALYSIS FOR SQL INJECTION ATTACKS Analysis: ------------------------------------------ What's an error that the SQL injection analysis can find? What could be done to improve precision? c. XSS attacks ------------------------------------------ STATIC ANALYSIS FOR XSS Analysis for type 1 (reflected XSS): ------------------------------------------ What's an error that the XSS analysis can find? D. limitations 1. perfectly precise static analysis is impossible a. background ------------------------------------------ BACKGROUND: PROBLEMS A *problem* is a specification of a procedure's desired relation Examples: ------------------------------------------ ------------------------------------------ BACKGROUND: EFFECTIVE PROCEDURE An effective procedure is "a set of rules which tell us, from moment to moment, precisely how to behave." -- M. Minsky, Computation: Finite and Infinite Machines (p. 106) Practical definition: ------------------------------------------ ------------------------------------------ EXAMPLE PROGRAM In Java: public class hailstone { public static long steps(long x) { if (x <= 1) { return 0; } else { return 1+steps(h(x)); } } public static long h(long x) { if (odd(x)) { return (3*x+1) / 2; } else { return x/2; } } public static boolean odd(long x) { return x%2 == 1; } } ------------------------------------------ What is steps(1)? What is steps(2)? What is steps(3)? What is steps(7)? What is steps(27)? Is it easy to predict the value of steps(x) for a given x? ------------------------------------------ CAN PROGRAMS DO EVERYTHING? program = effective procedure must always terminate If some problems can't be solved by programs, then set of problems ------------------------------------------ Was the code for steps() a program? Can you think of something that is impossible for a program to do? ------------------------------------------ WHAT DOES "IMPOSSIBLE" MEAN? Mathematical meaning: inconsistent with accepted axioms E.g., - find fraction equal to square root of 2 - solving the "halting problem" What we DON'T mean: - very hard - socially unacceptable - illegal - against physical laws ------------------------------------------ b. the halting problem ------------------------------------------ THE HALTING PROBLEM Write a program, halts(P,I), that takes code P, and input string I, and: halts(P, I) = true, if P halts on input I, halts(P, I) = false, if P does not halt on input I, halts always returns true or false in a finite amount of time. ------------------------------------------ Are we asking for precision or recall? What do we mean by "halts"? Would this be useful? How could you solve it? What would be an idea for an algorithm? c. impossibility result ------------------------------------------ SOLVING THE HALTING PROBLEM IS IMPOSSIBLE! Proof by contradiction: - Suppose program F solves the halting problem. - Let T be the program: T(P) = if F(P,P) then return loop() else return false end - Consider T(T) By definition, T must either loop forever or return false. If it loops forever, If it returns false, ------------------------------------------ d. other impossible problems ------------------------------------------ OTHER IMPOSSIBLE PROBLEMS Write AlwaysHalts(P) to decide if P halts on all inputs Write HaltsOnEmpty(P) to decide if P halts on empty input ------------------------------------------ ------------------------------------------ RICE'S THEOREM There is no program that can decide any non-trivial property of programs. ------------------------------------------ What does this mean for static analysis? e. What can we do? So can programs help programmers? i. an analogy ------------------------------------------ ANALOGY: DO I NEED AN UMBRELLA? Website tells if you need an umbrella What should it do? Need umbrella? | rains | no rain ===============|==========|========== "YES" | | | | "NO" | | ------------------------------------------ What are the possible outcomes? What should the web site say if there is only a 50% chance of rain in the forecast? ii. virus checking ------------------------------------------ VIRUS CHECKING Program tells if another program has a virus What should it do? Has virus? | malicious | safe ===============|===========|========== "YES" | | | | "NO" | | ------------------------------------------ What are the possible outcomes? So what should the tool do if it's not sure? f. summary ------------------------------------------ SUMMARY OF STATIC ANALYSIS LIMITS Can't have 100% precise solutions to interesting analysis questions But we can do something less precise extreme: "your program is wrong!" goal: get higher precision with 100% recall ------------------------------------------ 2. prefect recall won't happen with dynamic analysis ------------------------------------------ LIMITS OF TESTING To have airtight assurance about safety need to Problem: input space is practically Consider testing this: #define BYTE_BITS 8 int sign(long v) { /* return -1 if v <0, otherwise return +1 */ return +1 | (v >> (sizeof(long) * BYTE_BITS - 1)); } "Testing can only reveal the presence of errors, not their absence" -- E. Dijkstra ------------------------------------------ Why does testing work in other areas of engineering?