I. Taint Analysis (or Taint Checking) A. What is the problem? ------------------------------------------ THE PROBLEM Code Injection is 3rd on OWASP Top Ten (owasp.org/www-project-top-ten/, formerly in first place) From OWASP A03 Injection (owasp.org/Top10/A03_2021-Injection/): "App vulnerable when: "User-supplied data is not validated, filtered, or sanitized" "Dynamic queries or non-parameterized calls without context-aware escaping are used directly in the interpreter." "Hostile data is directly used or concatenated." "Common injections are: SQL, NoSQL, OS command, Object Relational Mapping (ORM), LDAP, and Expression Language (EL) or Object Graph Navigation Library (OGNL) injection." ------------------------------------------ 1. examples ------------------------------------------ EXAMPLE SCENARIO IN SQL Scenario 1 from OWSAP A03 (owasp.org/Top10/A03_2021-Injection/): String query = "SELECT \* FROM accounts WHERE custID='" + request.getParameter("id") + "'"; ------------------------------------------ What happens if the input starts with ' or '1'='1 ? ------------------------------------------ EXAMPLE SCENARIO in HQL Scenario 2 from OWSAP A03 (owasp.org/Top10/A03_2021-Injection/): Query HQLQuery = session.createQuery( "FROM accounts WHERE custID='" + request.getParameter("id") + "'"); ------------------------------------------ What happens if the input starts with ' or '1'='1 ? So, is this kind of attack limited to SQL? 2. other kinds of injection attacks Are there any other attacks where we should not trust user input? ------------------------------------------ OTHER KINDS OF INJECTION ATTACKS What other kinds of attacks might use inputs? ------------------------------------------ B. Preventing Injection Attacks ------------------------------------------ PREVENTING INJECTION ATTACKS From OWASP A03 Injection (owasp.org/Top10/A03_2021-Injection/) "The Preferred option is to use a safe API, which avoids using the interpreter" "Use positive server-side input validation. This is not a complete defense ..." "For residual dynamic queries, escape special characters using the specific escape syntax for that interpreter." ------------------------------------------ Why should using "any interpreter" be avoided? C. tool support for avoiding injection attacks and other attacks ------------------------------------------ HOW COULD TOOLS HELP What could a tool do to prevent code injection attacks? ------------------------------------------ 1. taint analysis ------------------------------------------ TAINT ANALYSIS Def: *taint* is Static taint analysis is Dynamic taint analysis is History: - a feature of Perl since 1989 ------------------------------------------ Why would either static or dynamic analysis be favored? a. DIFT: architectural support ------------------------------------------ DYNAMIC INFORMATION FLOW TRACKING (DIFT) See: G. Edward Suh, Jae W. Lee, David Zhang, and Srinivas Devadas. "Secure program execution via dynamic information flow tracking." In ASPLOS XI, pp. 85--96, ACM 2004. https://doi.org/10.1145/1024393.1024404 Idea: - Hard to stop memory overwrites (any bug can cause them) - Instead Approach: - architectural support to - trap to handler if detects ------------------------------------------ ------------------------------------------ ATTACK MODEL Of the Suh et al. 2004 paper: - Attacker can send "a malicious input that exploits a vulnerability in the program." - Programs may be buggy, but not malicious - Bugs may cause changes to memory ------------------------------------------ How could a bug change memory? b. information flow security ------------------------------------------ INFORMATION FLOW SECURITY See: Dorothy E. Denning and Peter J. Denning. "Certification of programs for secure information flow." CACM 20(7):504-513, July 1977. https://doi.org/10.1145/359636.359712 Confinement problem: - "confidential results should not depend on non-confidential data" Policy specified by: - set of security class (e.g., secret) - bindings of objects to security classes - flow relation specifying permitted flows Example: int Y = 0; if (X == 0) { Y = 0; } else { Y = 1; } Approach: static analysis ------------------------------------------ When does information flow from an object, X, to another object, Y? In the example, does information flow from X to Y? ------------------------------------------ RULES FOR INFORMATION FLOW ANALYSIS Taint checking plus: - implicit flows: from objects used in a condition to objects changed by code run conditionally ------------------------------------------ Does taint checking check implicit flows? c. Does information flow matter? ------------------------------------------ DOES INFORMATION FLOW MATTER? Format String attacks: printf(str); Stack could be: str: [ *--]--> "%*x%n\0" ------------------------------------------ What does %n do in C as a format string? What does %* do in C as a format string? Would the changes to the stack be caught by taint analysis? Could this make an untainted value depend on a tainted value? ------------------------------------------ CODE FROM VSPRINTF (EXCERPT) if (ch == '+') do_plus(); else if (ch == '%') do_percent(); /* ... */ else if (ch == '*') { width = read_from_stack(); do_width_asterics (); } ------------------------------------------ If ch is tainted, does taint checking give an alarm here? d. algorithm/rules for taint analysis ------------------------------------------ RULES FOR TAINTING From: Asia Slowinska and Herbert Bos. "Pointer tainting still pointless: (but we all see the point of tainting)". SIGOPS Oper. Syst. Rev. 44(3):88-92, July 2010. https://doi.org/10.1145/1842733.1842748 Basic tainting: - taint data (bytes) originally - taint data that is Alternative/complimentary approach: - mark sensitive data - warn if marked data is sent to an untrusted sink Can detect attacks on flow of control ------------------------------------------ e. Implementation in valgrind ------------------------------------------ TAINT CHECKING IN VALGRIND Works on x86 binaries (no source) Skins used: - TaintSeed: identifies sources - TaintTracker: policy for propagation - TaintAssert: traps dangerous uses Implementation: TaintSeed: - each byte of memory has 4-byte pointer to taint structure if that byte is tainted (else NULL) TaintTracker: = sets 4-byte pointer if result of op should be tainted TaintAssert: - checks uses of potentially dangerous ops against policy Exploit Analyzer: - backtrace of taint structures - helps understand attack - transfer to sandbox for analysis ------------------------------------------ Why would a user want to understand an attack? f. example of taint analysis ------------------------------------------ EXAMPLE OF TAINT ANALYSIS example0: strcpy(buff, argv[1]); /* ... */ return; example1: x = get_input(user); /* ... */ fptr = (void *fun())(x+42); /* ... */ *fptr(); example2: x = get_input(user); /* ... */ fptr = load(x); /* ... */ *fptr(); ------------------------------------------ In example0, could there be an exploit? What could happen in example0? Will a static analysis be able to detect the potential exploit? How is example1 different from example0? In example1, Should fptr be tainted? How should TaintTracker deal with x+42? Should calling a function through a pointer raise an alarm? In example2: What does the analysis need to know about load? g. efficiency considerations ------------------------------------------ EFFICIENCY CONSIDERATIONS How much memory does taint tracking use? How much time does taint tracking use? ------------------------------------------ h. policy considerations ------------------------------------------ POLICY CONSIDERATIONS x = get_input(user); /* ... */ fptr = load(x); /* ... */ *fptr(); Should the call be allowed? ------------------------------------------ 2. pointer injection, tainting with pointers ------------------------------------------ POINTER TAINTING RULES All the rules of basic tainting plus: - propagate taint to pointers, ------------------------------------------ a. rules for pointer injection detection ------------------------------------------ POINTER INJECTION (PI) DETECTION RULES From Slowinska and Bos, 2010: Track pointers: - at program start: - track all pointers to statically allocated memory - during runtime: - track all pointers returned by system/library calls that dynamically allocate memory Taint check: - propagation of untrusted data (basic) - propagation of pointers alert when: - use untrusted data for control flow - dereference untrusted pointer not tracked as a pointer ------------------------------------------ b. problems with pointer injection detection ------------------------------------------ PROBLEMS WITH POINTER INJECTION What could cause a false negative? What could cause a false positive? ------------------------------------------ Does the C programming language make it easy to find pointers? ------------------------------------------ CONFLICTING USE CASES FOR PI Slowinska and Bos (2010) observe that: - table lookups frequent and confuse analysis but - should not propagate taint for memory corruption analysis (to prevent false positives) - should propagate taint for malware analysis (to prevent false negatives) ------------------------------------------ D. evaluation of taint checking ------------------------------------------ EVALUATION OF TAINT CHECKING Advantages: + dynamic checking prevents many false positives Disadvantages: - doesn't check information flow int Y = 0; if (X == 0) { Y = 0; } else { Y = 1; } - misses changes to array indexes - PI has trouble balancing false positives vs. false negatives ------------------------------------------ II. Information Flow Security A. What is the problem? 1. recap, Denning and Denning (1977) ------------------------------------------ INFORMATION FLOW SECURITY Original paper: Dorothy E. Denning and Peter J. Denning. "Certification of programs for secure information flow." CACM 20(7):504-513, July 1977. https://doi.org/10.1145/359636.359712 Confinement problem: - "confidential results should not depend on non-confidential data" Policy specified by: Example: int Y = 0; if (X == 0) { Y = 0; } else { Y = 1; } ------------------------------------------ ------------------------------------------ RULES FOR INFORMATION FLOW ANALYSIS Taint checking plus: - implicit flows: from objects used in a condition to objects changed by code run conditionally ------------------------------------------ B. programming languages perspective (Sabelfeld and Myers, 2003) ------------------------------------------ ANOTHER LOOK AT THE PROBLEM See: A. Sabelfeld and A. C. Myers, Language-based information-flow security. IEEE Journal on Selected Areas in Communications, vol. 21(1):5-19, Jan. 2003 doi: 10.1109/JSAC.2002.806121 Standard for protecting confidential data: - access control Does that keep data confidential? Possible problems: ------------------------------------------ What does access control say about access to confidential data (e.g., a file)? Does encryption solve these problems? ------------------------------------------ MOTIVATION (from Sabelfeld and Myers, 2003) To ensure that info remains confidential: must analyze how information flows within program using it. Further: modern computing systems are complex, so manual analysis is infeasible Goal: Show information does NOT flow to where policy could be violated. ------------------------------------------ C. security-typed programming language ------------------------------------------ APPROACH: SECURITY TYPING Idea: - annotate types with policy specifications on use of data - enforce policies statically Advantages: ------------------------------------------ When does static enforcement happen? What is the runtime overhead of static enforcement? D. challenges ------------------------------------------ CHANNELS Def: a *channel* is a mechanism for Def: a *covert channel* is a channel that Examples: ------------------------------------------ What does existence of covert channels tell us about security research? ------------------------------------------ INTEGRITY: DUAL OF CONFIDENTIALITY Confidentiality stops information flows TO improper SINKS Def: *integrity* stops information flows FROM improper SOURCES ------------------------------------------ In the military, what role does integrity play? E. Example of Implicit Flows ------------------------------------------ EXAMPLE OF IMPLICIT FLOWS (from Fig. 1 of Sabelfeld and Myers, 2003) secret int h; /* high security */ public int l; /* low security */ /* ... */ h = h % 2; l = 0; if (h == 1) { l = 1; } Problem with Mandatory Access Control: - rest of program must be secret level! called "label creep" ------------------------------------------ How does the value of l relate to the value of h at the end? Is this demonstrating an explicit or implict flow? F. type checking for information flow ------------------------------------------ TYPE CHECKING INFORMATION FLOW security types: - normal type (e.g., int), - security label type checks - usual type checks and - check that flow permitted For implicit flows: - label program counter (pc) - track dependencies on conditions but only within scope of condition ------------------------------------------ Does this prevent label creep? 1. goal: noninterfereence ------------------------------------------ GOAL: NONINTERFERENCE Def: The *noninterference policy* says that Test: ------------------------------------------ 2. type system a. programming language ------------------------------------------ PROGRAMMING LANGUAGE FOR PROOFS Syntax: C ::= skip | var := exp | C1 ; C2 | if exp then C1 else C2 | while exp do C exp ::= var | constant | exp op exp var ::= h | l Semantics: v in Value = {..., -2, -1, 0, 1, 2, ...} s in State = {(sh, sl) | sh: var -fin-> Value and sl: var -fin-> Value } M: C -> (State -> (State union {\bot})) ------------------------------------------ When M is applied to a command C and a state s, what can the result be? b. formal statement of noninterference ------------------------------------------ FORMAL STATEMENT OF NONINTERFERENCE Equivalence relation on states: Def: (sh,sl) ==L (sh',sl') iff sl == sl' Def: A command C has the *noninterference property* iff forall s1, s2 in State s1 ==L s2 implies (M(C)(s1)) ==L (M(C)(s2)) ------------------------------------------ What does that formula mean? What does that mean for confidentiality? in ==L, why are is the high part of the state ignored? ------------------------------------------ EXAMPLES Which has the noninterference property? h := l+4 l := h if (l == 5) then h:=h+1 else l:=l+1 if (h == 3) then l:=5 else skip ------------------------------------------ c. type system ------------------------------------------ TYPING JUDGMENTS |- exp : t means exp has type t [pc] |- C means C type checks in context [pc] ------------------------------------------ ------------------------------------------ TYPE RULE NOTATION hyp1, hyp2 [name] ----------- concl means axiom schemes: [low] |- l : public Side conditions: e |- x : t, e |- y : t2 [lookup] ----------------------- e |- x : t3 if e(x)=t, e(y)=t2, and t3 = t+t2 Free variables are universally quantified over ENTIRE rule: |- h : secret, l : public [plus] --------------------------- h+l : secret |- exp1 <= exp2, |- exp2 <= exp3 [trans] ----------------------------- |- exp1 <= exp3 ------------------------------------------ Can we take h to mean one thing in the hypothesis and another in the conclusion? What does the trans rule mean? ------------------------------------------ TYPE RULES See: (Sabelfeld and Myers, 2003) and D. Volpano, G. Smith, and C. Irvine. A sound type system for secure flow analysis. J. Comput. Security, vol. 4(3):167--187, 1996. assume h is the secret (high) variable l is the public (low) variable [TE1] |- exp : secret [TE2] |- exp : public if h not in Vars(exp) [TC1] [pc] |- skip [TC2] [pc] |- h := exp |- exp: public [TC3] -------------------- [public] |- l := exp [pc] |- C1, [pc] |- C2 [TC4] ---------------------- [pc] |- C1 ; C2 |- exp : pc, [pc] |- C [TC5] ----------------------- [pc] |- while exp do C |- exp : pc, [pc] |- C1, [pc] |- C2 [TC6] ----------------------------- [pc] |- if exp then C1 else C2 [secret] |- C [TC7] ------------------- [public] |- C ------------------------------------------ What bad outcome are we trying to avoid? What does TE1 mean? When can TE2 be used? Why is TC1 sound? Why is TC2 sound? How does TC3 prevent explicit flows? Why is the security context for TC3 [public]? What is TC4 saying? Where does [pc] come from in TC5 and TC6? Why? What is TC7 saying? Why is it sound? ------------------------------------------ EXAMPLE OF TYPE CHECKING --------------------------------- ------------------------------------------ Note, need to use TC6 due to form of conclusion Show how get the three hypotheses: ... [TE1] |- h ==1 : secret, [ ] [secret] |- l := 1, [TC1] [secret] |- skip but no proof for the second hypothesis! so this fails!