I. Taint Analysis (or Taint Checking)
 A. What is the problem?
------------------------------------------
          THE PROBLEM

Code Injection is
   3rd on OWASP Top Ten
   (owasp.org/www-project-top-ten/,
   formerly in first place)

From OWASP A03 Injection
  (owasp.org/Top10/A03_2021-Injection/):

  "App vulnerable when:

  "User-supplied data is not validated,
   filtered, or sanitized"

  "Dynamic queries or non-parameterized
  calls without context-aware escaping
  are used directly in the interpreter."

  "Hostile data is directly used
  or concatenated."

  "Common injections are:
     SQL, NoSQL, OS command,
     Object Relational Mapping (ORM),
     LDAP, and Expression Language (EL)
     or Object Graph Navigation Library
     (OGNL) injection."
     
------------------------------------------
  1. examples
------------------------------------------
          EXAMPLE SCENARIO IN SQL

Scenario 1 from OWSAP A03
 (owasp.org/Top10/A03_2021-Injection/):

 String query =
  "SELECT \* FROM accounts WHERE custID='"
     + request.getParameter("id")
     + "'";

------------------------------------------
        What happens if the input starts with ' or '1'='1 ?
------------------------------------------
          EXAMPLE SCENARIO in HQL

Scenario 2 from OWSAP A03
 (owasp.org/Top10/A03_2021-Injection/):

 Query HQLQuery =
    session.createQuery(
    "FROM accounts WHERE custID='"
    + request.getParameter("id") + "'");

------------------------------------------
        What happens if the input starts with
            ' or '1'='1
           ?
        So, is this kind of attack limited to SQL?
  2. other kinds of injection attacks
        Are there any other attacks where we should not trust user input?
------------------------------------------
     OTHER KINDS OF INJECTION ATTACKS

What other kinds of attacks
 might use inputs?


------------------------------------------
 B. Preventing Injection Attacks
------------------------------------------
      PREVENTING INJECTION ATTACKS

From OWASP A03 Injection
 (owasp.org/Top10/A03_2021-Injection/)
 
 "The Preferred option is to
  use a safe API,
  which avoids using the interpreter"

 "Use positive server-side
 input validation.
 This is not a complete defense ..."

 "For residual dynamic queries,
 escape special characters using
 the specific escape syntax
 for that interpreter."

------------------------------------------
        Why should using "any interpreter" be avoided?
 C. tool support for avoiding injection attacks and other attacks
------------------------------------------
         HOW COULD TOOLS HELP

What could a tool do to prevent
   code injection attacks?


------------------------------------------
  1. taint analysis
------------------------------------------
          TAINT ANALYSIS

Def: *taint* is


Static taint analysis is


Dynamic taint analysis is


History:
 - a feature of Perl since 1989
------------------------------------------
        Why would either static or dynamic analysis be favored?
   a. DIFT: architectural support
------------------------------------------
 DYNAMIC INFORMATION FLOW TRACKING (DIFT)

See:
G. Edward Suh, Jae W. Lee, David Zhang,
and Srinivas Devadas.
"Secure program execution via
dynamic information flow tracking."
In ASPLOS XI, pp. 85--96, ACM 2004.
https://doi.org/10.1145/1024393.1024404

Idea:

  - Hard to stop memory overwrites
    (any bug can cause them)
  - Instead


Approach:
  - architectural support to


  - trap to handler if detects


------------------------------------------
------------------------------------------
         ATTACK MODEL

Of the Suh et al. 2004 paper:

  - Attacker can send
    "a malicious input that exploits
     a vulnerability in the program."

  - Programs may be buggy,
    but not malicious

  - Bugs may cause changes to memory


------------------------------------------
    How could a bug change memory?
   b. information flow security
------------------------------------------
     INFORMATION FLOW SECURITY

See:
Dorothy E. Denning and Peter J. Denning.
"Certification of programs for
 secure information flow."
CACM 20(7):504-513, July 1977.
https://doi.org/10.1145/359636.359712

Confinement problem:

  - "confidential results
    should not depend on
    non-confidential data"

Policy specified by:
 - set of security class (e.g., secret)
 - bindings of objects to security classes
 - flow relation
   specifying permitted flows

Example:
   int Y = 0;
   if (X == 0) {
      Y = 0;
   } else {
      Y = 1;
   }
   

Approach: static analysis

------------------------------------------
        When does information flow from an object, X,
           to another object, Y?
        In the example, does information flow from X to Y?
------------------------------------------
    RULES FOR INFORMATION FLOW ANALYSIS

Taint checking plus:

   - implicit flows:
       from objects used in a condition
       to objects changed
       by code run conditionally
 
------------------------------------------
        Does taint checking check implicit flows?
   c. Does information flow matter?
------------------------------------------
      DOES INFORMATION FLOW MATTER?

Format String attacks:

    printf(str);

  Stack could be:

  str: [  *--]--> "%*x%n\0"


------------------------------------------
        What does %n do in C as a format string?
        What does %* do in C as a format string?
      Would the changes to the stack be caught by taint analysis?
      Could this make an untainted value depend on a tainted value?
------------------------------------------
      CODE FROM VSPRINTF (EXCERPT)

  if (ch == '+')
     do_plus();
  else if (ch == '%')
     do_percent();
  /* ... */

  else if (ch == '*') {
    width = read_from_stack();
    do_width_asterics ();
  }

------------------------------------------
    If ch is tainted, does taint checking give an alarm here?
   d. algorithm/rules for taint analysis
------------------------------------------
        RULES FOR TAINTING

From: Asia Slowinska and Herbert Bos.
 "Pointer tainting still pointless:
  (but we all see the point of tainting)".
  SIGOPS Oper. Syst. Rev. 44(3):88-92,
  July 2010.
  https://doi.org/10.1145/1842733.1842748


Basic tainting:
  - taint data (bytes) originally


  - taint data that is 


Alternative/complimentary approach:
  - mark sensitive data
  - warn if marked data is sent to
    an untrusted sink


Can detect attacks on flow of control

------------------------------------------
   e. Implementation in valgrind
------------------------------------------
     TAINT CHECKING IN VALGRIND

Works on x86 binaries (no source)

Skins used:
  - TaintSeed: identifies sources
  - TaintTracker: policy for propagation
  - TaintAssert: traps dangerous uses

Implementation:

TaintSeed:
  - each byte of memory has
    4-byte pointer to taint structure
    if that byte is tainted (else NULL)

TaintTracker:
  = sets 4-byte pointer if result of op
    should be tainted

TaintAssert:
  - checks uses of
    potentially dangerous ops
    against policy

Exploit Analyzer:
  - backtrace of taint structures
  - helps understand attack
  - transfer to sandbox for analysis
------------------------------------------
        Why would a user want to understand an attack?
   f. example of taint analysis
------------------------------------------
       EXAMPLE OF TAINT ANALYSIS

 example0:
  strcpy(buff, argv[1]);
  /* ... */
  return;
  

 example1:
  x = get_input(user);
  /* ... */
  fptr = (void *fun())(x+42);
  /* ... */
  *fptr();


 example2:
  x = get_input(user);
  /* ... */
  fptr = load(x);
  /* ... */
  *fptr();


------------------------------------------
        In example0, could there be an exploit?
        What could happen in example0?
        Will a static analysis be able to detect the potential exploit?
        How is example1 different from example0?
        In example1, Should fptr be tainted?
        How should TaintTracker deal with x+42?
        Should calling a function through a pointer raise an alarm?
        In example2: What does the analysis need to know about load?
   g. efficiency considerations
------------------------------------------
    EFFICIENCY CONSIDERATIONS

How much memory does taint tracking use?


How much time does taint tracking use?


------------------------------------------
   h. policy considerations
------------------------------------------
         POLICY CONSIDERATIONS

  x = get_input(user);
  /* ... */
  fptr = load(x);
  /* ... */
  *fptr();

Should the call be allowed?


------------------------------------------
  2. pointer injection, tainting with pointers
------------------------------------------
          POINTER TAINTING RULES

All the rules of basic tainting plus:
  - propagate taint to pointers, 


------------------------------------------
   a. rules for pointer injection detection
------------------------------------------
  POINTER INJECTION (PI) DETECTION RULES

From Slowinska and Bos, 2010:

 Track pointers:
  - at program start:
     - track all pointers
       to statically allocated memory
  - during runtime:
     - track all pointers returned
       by system/library calls
       that dynamically allocate memory

 Taint check:
   - propagation of untrusted data (basic)
   - propagation of pointers

   alert when:
     - use untrusted data for control flow
     - dereference untrusted pointer
       not tracked as a pointer
       
------------------------------------------
   b. problems with pointer injection detection
------------------------------------------
      PROBLEMS WITH POINTER INJECTION

What could cause a false negative?


What could cause a false positive?


------------------------------------------
        Does the C programming language make it easy to find pointers?
------------------------------------------
     CONFLICTING USE CASES FOR PI

Slowinska and Bos (2010) observe that:
 - table lookups frequent
   and confuse analysis

 but 
   - should not propagate taint
     for memory corruption analysis
     (to prevent false positives)
   - should propagate taint
     for malware analysis
     (to prevent false negatives)

------------------------------------------
 D. evaluation of taint checking
------------------------------------------
       EVALUATION OF TAINT CHECKING

Advantages:
  + dynamic checking
    prevents many false positives
    
Disadvantages:

  - doesn't check information flow

   int Y = 0;
   if (X == 0) {
      Y = 0;
   } else {
      Y = 1;
   }

  - misses changes to array indexes

  - PI has trouble balancing
    false positives vs. false negatives


------------------------------------------
II. Information Flow Security
 A. What is the problem?
  1. recap, Denning and Denning (1977)
------------------------------------------
     INFORMATION FLOW SECURITY

Original paper:

Dorothy E. Denning and Peter J. Denning.
"Certification of programs for
 secure information flow."
CACM 20(7):504-513, July 1977.
https://doi.org/10.1145/359636.359712

Confinement problem:

  - "confidential results
    should not depend on
    non-confidential data"

Policy specified by:

Example:
   int Y = 0;
   if (X == 0) {
      Y = 0;
   } else {
      Y = 1;
   }
   
------------------------------------------
------------------------------------------
    RULES FOR INFORMATION FLOW ANALYSIS

Taint checking plus:

   - implicit flows:
       from objects used in a condition
       to objects changed
       by code run conditionally
 
------------------------------------------
 B. programming languages perspective (Sabelfeld and Myers, 2003)
------------------------------------------
      ANOTHER LOOK AT THE PROBLEM

See:

A. Sabelfeld and A. C. Myers,
Language-based information-flow security.
IEEE Journal on Selected Areas in
Communications, vol. 21(1):5-19, Jan. 2003
  doi: 10.1109/JSAC.2002.806121

Standard for protecting confidential data:
  - access control


Does that keep data confidential?


Possible problems:


------------------------------------------
   What does access control say about
      access to confidential data (e.g., a file)?
   Does encryption solve these problems?
------------------------------------------
          MOTIVATION

(from Sabelfeld and Myers, 2003)

To ensure that info remains confidential:
   must analyze how information flows
   within program using it.

Further:
   modern computing systems are complex,
   so manual analysis is infeasible
   

Goal:
   Show information does NOT flow
   to where policy could be violated.
   
------------------------------------------
 C. security-typed programming language
------------------------------------------
        APPROACH: SECURITY TYPING

Idea:
 - annotate types with
   policy specifications
   on use of data

 - enforce policies statically


Advantages:


------------------------------------------
        When does static enforcement happen?
        What is the runtime overhead of static enforcement?
 D. challenges
------------------------------------------
             CHANNELS

Def: a *channel* is a mechanism for


Def: a *covert channel* is a channel that


Examples:


------------------------------------------
       What does existence of covert channels tell us about
          security research?
------------------------------------------
    INTEGRITY: DUAL OF CONFIDENTIALITY

Confidentiality stops information flows
     TO   improper SINKS

Def: *integrity* stops information flows
     FROM improper SOURCES


------------------------------------------
        In the military, what role does integrity play?
 E. Example of Implicit Flows
------------------------------------------
           EXAMPLE OF IMPLICIT FLOWS

(from Fig. 1 of Sabelfeld and Myers, 2003)

   secret int h;  /* high security */
   public int l;  /* low security */

   /* ... */

   h = h % 2;
   l = 0;
   if (h == 1) {
      l = 1;
   }


Problem with Mandatory Access Control:
  - rest of program must be secret level!
  called "label creep"

------------------------------------------
      How does the value of l relate to the value of h at the end?
      Is this demonstrating an explicit or implict flow?
 F. type checking for information flow
------------------------------------------
       TYPE CHECKING INFORMATION FLOW

security types:
   - normal type (e.g., int),
   - security label

type checks
   - usual type checks and
   - check that flow permitted

For implicit flows:

   - label program counter (pc)
   - track dependencies on conditions
      but only within scope of condition


------------------------------------------
        Does this prevent label creep?
  1. goal: noninterfereence
------------------------------------------
     GOAL: NONINTERFERENCE

Def: The *noninterference policy*
     says that


Test:


------------------------------------------
  2. type system
   a. programming language
------------------------------------------
      PROGRAMMING LANGUAGE FOR PROOFS

Syntax:

C ::= skip | var := exp
    | C1 ; C2
    | if exp then C1 else C2
    | while exp do C

exp ::= var | constant
    | exp op exp

var ::= h | l


Semantics:
  v in Value = {..., -2, -1, 0, 1, 2, ...}

  s in State =
       {(sh, sl) | 
         sh: var -fin-> Value and
         sl: var -fin-> Value }

  M: C -> (State -> (State union {\bot}))


------------------------------------------
        When M is applied to a command C and a state s,
           what can the result be?
   b. formal statement of noninterference
------------------------------------------
    FORMAL STATEMENT OF NONINTERFERENCE

Equivalence relation on states:

Def: (sh,sl) ==L (sh',sl') iff sl == sl'


Def: A command C has
   the *noninterference property* iff
   forall s1, s2 in State

      s1 ==L s2
        implies  (M(C)(s1)) ==L (M(C)(s2))

------------------------------------------
        What does that formula mean?
        What does that mean for confidentiality?
        in ==L, why are is the high part of the state ignored?
------------------------------------------
         EXAMPLES

Which has the noninterference property?

   h := l+4

   l := h

   if (l == 5) then h:=h+1 else l:=l+1

   if (h == 3) then l:=5 else skip


------------------------------------------
   c. type system
------------------------------------------
         TYPING JUDGMENTS


    |- exp : t

    means

      exp has type t


   [pc] |- C

   means

     C type checks in context [pc]
------------------------------------------
------------------------------------------
      TYPE RULE NOTATION

          hyp1, hyp2
 [name]   -----------
            concl

 means


axiom schemes:

  [low]  |- l : public
            
Side conditions:

          e |- x : t, e |- y : t2 
 [lookup] -----------------------   
                e |- x : t3
                            if e(x)=t,
                               e(y)=t2,
                            and t3 = t+t2

Free variables are universally quantified
   over ENTIRE rule:

         |-  h : secret, l : public
 [plus]  ---------------------------
               h+l : secret

          |- exp1 <= exp2,
          |- exp2 <= exp3
 [trans]  -----------------------------
              |- exp1 <= exp3
------------------------------------------
     Can we take h to mean one thing in the hypothesis and another
     in the conclusion?
     What does the trans rule mean?
------------------------------------------
           TYPE RULES

See: (Sabelfeld and Myers, 2003)
and
D. Volpano, G. Smith, and C. Irvine.
A sound type system for secure flow
analysis. J. Comput. Security,
vol. 4(3):167--187, 1996.

assume h is the secret (high) variable
       l is the public (low) variable
       
[TE1] |- exp : secret

  
[TE2] |- exp : public
                     if h not in Vars(exp)

[TC1] [pc] |- skip

[TC2] [pc] |- h := exp

          |- exp: public
[TC3] --------------------
      [public] |- l := exp

      [pc] |- C1, [pc] |- C2    
[TC4] ----------------------
         [pc] |- C1 ; C2

      |- exp : pc,  [pc] |- C
[TC5] -----------------------
      [pc] |- while exp do C

      |- exp : pc, [pc] |- C1,
                   [pc] |- C2
[TC6] -----------------------------
      [pc] |- if exp then C1 else C2

         [secret] |- C
[TC7] -------------------
         [public] |- C
------------------------------------------
        What bad outcome are we trying to avoid?
        What does TE1 mean?
        When can TE2 be used?
        Why is TC1 sound?
        Why is TC2 sound?
        How does TC3 prevent explicit flows?
        Why is the security context for TC3 [public]?
        What is TC4 saying?
        Where does [pc] come from in TC5 and TC6? Why?
        What is TC7 saying?  Why is it sound?
------------------------------------------
          EXAMPLE OF TYPE CHECKING


         ---------------------------------
------------------------------------------
        Note, need to use TC6 due to form of conclusion
        Show how get the three hypotheses:

        ... [TE1]     |- h ==1 : secret,
            [   ]     [secret] |- l := 1,
            [TC1]      [secret] |- skip

      but no proof for the second hypothesis! so this fails!