I. Memory Corruption Attacks
 A. attacker goals
------------------------------------------
     MEMORY ATTACK GOALS

Goals of attacker:


Attack examples:
   - Buffer overflows
   - Integer overflows
   - Format string attacks
   
------------------------------------------
 B. Buffer overflow attacks
  1. background
------------------------------------------
  BACKGROUND FOR BUFFER OVERFLOW ATTACKS

- C calling conventions
   with stack/heap layout

- How addressing in C works

------------------------------------------
     Why do we focus on C for this?
     Is there a way for an attacker to find out what machine/OS is
     running a program like a web server?
------------------------------------------
         LINUX PROCESS MEMORY LAYOUT


 0xC0000000 |------------------|
            |   user stack     |
            |vvvvvvvvvvvvvvvvvv| <- %esp
            |                  |
            |                  |
            |^^^^^^^^^^^^^^^^^^|
            | shared libraries |
 0x40000000 |------------------|
            |                  |
            |^^^^^^^^^^^^^^^^^^| <- brk
            |    runtime       |
            |     heap         |
            |------------------| loaded
            |                  | by exec()
 0x08048000 |------------------| 
            |                  | unused
          0 |------------------|
          
------------------------------------------
  2. how attacks work in C
------------------------------------------
          BUFFER OVERFLOW ATTACK

Example C (or C++) function

void f(char *strarg) {
  char buf[100];
  strcpy(buf, strarg);
  /* ... */
}

Stack frame of f when called:

     |------------------|
     |   strarg         |
     |------------------|
     |   return address |
     |------------------|
     |   old esp        |
     |------------------|
     |  buf[99]         |
     |                  |
     |   ... buf ...    |
     |                  |
     |  buf[1]          |
     |  buf[0]          |
     |vvvvvvvvvvvvvvvvvv| <- %esp
     
------------------------------------------
        What happens if strarg has 108 characters before the null char?
------------------------------------------
       BUFFER OVERFLOW AFTER STRCPY

Example C (or C++) function

void f(char *strarg) {
  char buf[100];
  strcpy(buf, strarg);
  /* ... */
}

Stack frame of f after strcpy

     |------------------|
     |   strarg         |
     |------------------|
     |                  |
     |------------------|
     |                  |
     |------------------|
     |  buf[99]         |
     |                  |
     |   ... buf ...    |
     |                  |
     |  buf[1]          |
     |  buf[0]          |
     |vvvvvvvvvvvvvvvvvv| <- %esp
     
------------------------------------------
        What is now in the place where the return address was?
        How could an attacker exploit that?
        Is there anything special about the size of 100?
        Will the attack work if the function f crashes?
        Why doesn't the OS stop this attack?
------------------------------------------
     BUFFER OVERFLOW ATTACK STEPS

 a. overwrite the stack so that
    code to execute is at location P
 b. overwrite the return address
    to be 
 c. when call returns, calls P


------------------------------------------
        Is the new code to execute in the heap or the stack?
        What does the attacker need to know?
------------------------------------------
        THE NOP SLIDE

Where should attacker's code start?

     |      ...         |
     |  attacker's code |
     |   start of P     |
     |------------------|
     |      ...         |
     | nop instructions |
     |==================|
     |                  |
     |------------------|
     |   return address |
     |------------------|
     |                  |
     |------------------|
     |                  |
     |   ... buf ...    |
     |                  |
     |vvvvvvvvvvvvvvvvvv| <- %esp
     
------------------------------------------
        How does the attacker the nop instructions and P on the stack?
        Will the attacker's code work if the return address points
        into the list of nop instructions?
        Will the attacker's code work if it contains a byte of zero (0)?
  3. background, why C?
------------------------------------------
    WHY NOT USE A MEMORY-SAFE LANGUAGE?

Why not use a memory-safe language
   like Java, Python, C#, Rust, ...


------------------------------------------
    Why not use a memory-safe language?
  4. background, unsafe libc functions
------------------------------------------
      UNSAFE FUNCTIONS IN LIBC

strcpy(char *dest, char *src)
strcat(char *dest, char *src)
gets(char *s)
scanf(char *format, ...)
  ...


------------------------------------------
        Why can't these functions in libc check sizes?
        How could such C functions be made safer?
------------------------------------------
        SAFER FUNCTIONS IN LIBC

strncpy(char *dest, char *src, size_t n)

strncat(char *dest, char *src, size_t n)
fgets(char *s, size_t n, FILE *stream)

with scanf,
   use width specifier to read strings
   e.g., %100s
------------------------------------------
  5. background: other possibilities for buffer overflows
   a. SEH overwrite attack
------------------------------------------
  STRUCTURED EXCEPTION HANDLER OVERWRITES

Structured Exception Handler (SEH)
   - exception dispatch in Windows
   - registration record on stack
       contains:
         - pointer to handler
         - pointer to next record
         
   
------------------------------------------
        When are SEHs used in Windows?
        How could such an attack be prevented?
   b. Function pointer overwriting
      What would happen if there is a data structure
          with a function pointer above an array in C or C++?
------------------------------------------
        FUNCTION POINTERS IN DATA

Is this safe?

     |------------------|
     | function pointer |
     |------------------|
     |                  |
     |   ... buf ...    |
     |                  |
     |------------------|

------------------------------------------
   c. Setjmp buffer overwriting
------------------------------------------
    BACKGROUND: SETJMP and LONGJMP in C

setjmp() saves
    - stack pointer (sp),
    - frame pointer (fp), and
    - program counter (pc).

in an array

longjmp() restores these registers

So, what does the following do?

    #include <setjmp.h>

    jmp_buf env;
    
    setjmp(env);
    label1:  ;

    /* ... */
    longjmp(env, 2);
------------------------------------------
------------------------------------------
     SETJMP and LONGJMP in C

/* file setjmp.h ... */
#define _JBLEN  9
typedef struct{
   int _jb[_JBLEN + 1];
} jmp_buf[1];

Consider:

#include <setjmp.h>
jmp_buf handler;
char msg[100];

void set_handler(char *s, fptr rest) {
  int i = setjmp(saved);
  /* ... */
  strcpy(msg, s);
  *rest();
}

void call_handler() {
  longjmp(handler, 2);
}

Eventually program does:

  read_from_user(s);
  set_handler(s);

  /* ... */
  call_handler();

------------------------------------------
        What could go wrong in that code?
II. Memory Corruption Defense
 A. Problem and approaches
------------------------------------------
  HOW TO PREVENT BUFFER OVERFLOW ATTACKS?


------------------------------------------
        How could buffer overflows be prevented in C/C++?
 B. Stack Canaries
------------------------------------------
       STACK CANARIES

Goal: catch overwrite
      before 

     |------------------|
     |   strarg         |
     |------------------|
     |   return address |
     |------------------|
     |   old esp        |
     |------------------|
     |  canary value    |
     |------------------|
     |  buf[99]         |
     |                  |
     |   ... buf ...    |
     |                  |
     |  buf[1]          |
     |  buf[0]          |
     |vvvvvvvvvvvvvvvvvv| <- %esp

------------------------------------------
     What would be a good value for the canary?
------------------------------------------
    WHEN DO STACK CANARIES FAIL?

When attacker:

    
------------------------------------------
 C. Electric Fences
------------------------------------------
        ELECTRIC FENCES

Every heap allocation between pages

             [ Guard Page ]
             [ Heap Alloc.]
             [ Guard Page ]

Guard pages are protected by hardware

Write (or read) from guard pages
   causes an OS trap
------------------------------------------
        What's the cost of this?
 D. Bounds Checking
  1. goals and costs
------------------------------------------
         BOUNDS CHECKING

Goal:


What does it mean in C?


------------------------------------------
    What does it mean for a pointer in C to be "in bounds"?
    Can you enforce bounds in C without help from the compiler?
------------------------------------------
       WHAT HAS TO BE DONE

Interject code whenever program does:

   - pointer arithmetic
       p + 1
       
   - pointer dereference
------------------------------------------
        In C, why can't we give an error when forming a pointer
           past the end of a bound?
        In C, why can't we just give an error when dereferencing a pointer?
------------------------------------------
     PRACTICAL CONSIDERATIONS

How expensive are defenses?

Would people pay for an OS that is:


------------------------------------------
  2. fat pointers
------------------------------------------
         FAT POINTERS

Instead of just an address, each pointer
stores:


------------------------------------------
       What's the overhead of this?
       Will this interoperate with old code?
       Does using fat pointers need compiler support?
       Can fat pointers be updated atomically?
  3. overview
------------------------------------------
     BOUNDS CHECKING TECHNIQUES

Splay trees:
   - record sizes for each object
   - grow larger as objects allocated

Shadow table:
   - memory divided into
     constant-sized slots
   - pointers converted to indexes
     into shadow table
   - can locate metadata
     in


------------------------------------------
        What resources will limit shadow table precision?
  4. Baggy Bounds checking
   a. for 32 bit systems
------------------------------------------
          BAGGY BOUNDS CHECKING

Goals:
  - detect and stop out of bounds accesses
  - ensure derived pointers
    point to same object
  - don't allow reading/writing other
    allocations
    
Constraints
  - use same size pointers
  - permit pointers one past either end

Ideas:
  - round up each allocation to nearest
    power of 2 (bytes, say)
       e.g., 9 -> 16, 28 -> 32, ...
  - enforce bounds of the allocation
  - store binary log of allocation
    in bounds table, bt
  - allocate memory with granularity
    of a slot (fixed size, say 16 bytes)

Let size be the allocation size (in bytes)
    e = log_2(size)
  so 1 << e = 2^e = size

So, in bounds table, bt, store e value

  recover the size by look up in bt
    
------------------------------------------
        How do we know the index into bt to look up the size?
------------------------------------------
  RECOVERING LENGTH AND BASE FROM POINTER

Find the allocated size from pointer p:
  len(p) = 1 << bt[p >> log_2(slot_size)]

Example: allocate 32 bytes
          char *p = malloc(32);
          
      for slot_size = 16
          log_2(slot_size) = 4
          
    len(p) = 1 << bt[p>>4]
    would put 5 (i.e., log_2(32))
      in bt entries for 16 byte slots:
        bt[p>>4] = 5
        bt[(p>>4)+1] = 5

Find starting address from pointer p:
   base(p) = p & ~(len(p)-1)

   Example:
        int a[100];
        int *pa = &a[0];

    allocate sizeof(int)*100 rounded
      up to power of 2 bytes = 512 bytes
      so need 512/16 == 32 slots
       each bt entry stores log_2(512) = 9

    suppose address of pa == 0x400
    so use 32 bt entries:
       pa >> 4 = 0x40 (== 64)

       bt[pa>>4] = 9
       bt[(pa>>4)+1] = 9
       ...
       bt[(pa>>4)+31] = 9

    suppose:
        int *pb = &a[75];
                == 0x400 + 4*0x4B 
                == 0x400 + 0x12C
                == 0x52C (== 1324)

        len(pb) = 1 << bt[pb>>4]
                = 1 << bt[0x52C>>4]
                = 1 << bt[0x52]
                = 1 << 9
                = 0x200 (== 512)
        base(pb) = pb & ~(len(pb)-1)
                 = 0x52C & ~(1FF)
                 = 0x400
------------------------------------------
    What is 1 << s in normal mathematical terms?
    Do we need to align the allocations?
    Why do we allocate 512 bytes for the array a (of 100 ints)?
    Why do we need 32 entries in bt for pa (or a)?
------------------------------------------
    BAGGY BOUNDS CHECKING (32 bit Arch.)      

inBounds(p) = check(p,base(p),len(p))
check(p,begin,end) = begin <= p
                     and p < begin+end

Result of pointer arithmetic:
 for computation of
    p2 = p + n;

 add the code:
    p2 = result(p2,p);

 where
 
 result(p2,p) =
  let begin = base(p)
      end = len(p)
  in
   if check(p2,begin,end)
   then p2
   else if begin-p2 < 8
           and p2 < begin+end+8
        then set_high_bit(p2)
        else error("illegal pointer")

------------------------------------------
   Why set high-order bit of out-of-bounds pointers to 1?
   Why allow 1/2 a slot under or over?
   How do we find the correct base and length for illegal pointers?
   Why not allow larger than 1/2 a slot under or over bounds?
   Do we need any other special treatment of illegal pointers?
   Does code need to check bounds of dereferences?
------------------------------------------
      WORKING WITH UNINSTRUMENTED CODE

Code in libraries that was not compiled
  with baggy bounds checks

  bt entries set to maximum possible (31)

   
------------------------------------------
       Will there ever be dereference errors for pointers to space
          allocated by library code?
       Can an out-of-bounds pointer be passed to uninstumented code?
   b. for 64 bit systems
------------------------------------------
    BAGGY BOUNDS WITH 64 BIT POINTERS

Idea:
   Store more information in the pointers

 Valid pointer:
 [ zeros | log2(size) | address     ]
  21 bits    5 bits      38 bits

 Invalid pointer:
 [ offset | log2(size) | zeros | address ]
   13 bits   5 bits      8 bits  38 bits
------------------------------------------
        What are the advantages of putting this information in the pointers?
   c. evaluation
------------------------------------------
      EVALUATING BAGGY BOUNDS SYSTEM

Disadvantages:


Advantages:


------------------------------------------
        Can still have memory attacks with this system?
 E. Non-executable Memory
------------------------------------------
         NON-EXECUTABLE MEMORY

3 bits for permission on a page:
   read (R), write (W), execute (X)

Execute means that


Policy:  W xor X
   so can't both write and execute
    in a page


------------------------------------------
        What kind of program would need to both write and execute code?
 F. Randomized Address Space (ASLR)
------------------------------------------
        RANDOMIZED ADDRESS SPACE

Idea:
   - make it harder for attacker
     to


Often called ASLR
  = Address Space Layout Randomization
------------------------------------------
------------------------------------------
       DEFEATING ASLR

Approaches:

   - extract the randomness


   - heap attack:
     . spread shell code all over the heap
     . jump to a random address

   - nop slide
     . put lots of nops in heap
     . put shell code at end
     . jump to random address
------------------------------------------
 G. practice
------------------------------------------
    WHAT IS USED IN PRACTICE?

Both gcc and Visual Studio:
  - use stack canaries by default

Linux and Windows both have
   W xor X memory
 and
   ASLR

    WHAT IS NOT USED IN PRACTICE?

Baggy bounds


------------------------------------------
    Why would systems not use baggy bounds?
III. Return-oriented Programming
------------------------------------------
 HOW ATTACKERS CAN DEFEAT ASLR AND W xor X         


ASLR
   - hard to know address
     of attacker's code

W xor X protection
   - can't execute shell code put on stack


Both are widely used in Linux and Windows

But there is a kind of attack that
   could defeat both...


------------------------------------------
        The combination of ASLR and W xor X protection is powerful,
           but could there still be a way to:
              (a) do arbitrary computations, and
              (b) not need to know addresses ahead of time?
------------------------------------------
          RETURN-ORIENTED PROGRAMMING

Characteristics:
  - only needs to use return addresses
    on stack, so no code executed in stack


  - doesn't need to know fixed addresses


Idea:


------------------------------------------
 A. examples to motivate the idea
  1. simplest motivating example
------------------------------------------
     MOTIVATING EXAMPLE

C program:

void run_shell() {
   system("/bin/sh");
}
void get_msg() {
   char buf[100];
   gets(buf);
}


------------------------------------------
        How could attacker call run_shell?
------------------------------------------
    EXPLAINING THE MOTIVATING EXAMPLE

          |                     |
          |---------------------|
          | return address      |
          |---------------------|
          |                     |
          |---------------------|
          | ... buf[98], buf[99]|
          |        ...          |
  %esp -> | buf[0], buf[1], ... |
          |vvvvvvvvvvvvvvvvvvvvv|
          
------------------------------------------
  2. slightly more difficult example
------------------------------------------
       SLIGHTLY MORE DIFFICULT EXAMPLE

C program:

char sh_path = "/bin/sh";
void list_dir() {
   system("/bin/ls");
}
void get_msg() {
   char buf[100];
   gets(buf);
}

------------------------------------------
        How could this program be attacked to run the shell?
------------------------------------------
         FAKING A CALL TO SYSTEM

Want to make a stack frame like a call:


          |                     |
          |---------------------|
          |  argument value     |
          |---------------------|
  %esp -> | return address      |
          |vvvvvvvvvvvvvvvvvvvvv|


------------------------------------------
------------------------------------------
          HOW TO CREATE A CALL FRAME

char sh_path = "/bin/sh";
void list_dir() {
   system("/bin/ls");
}
void get_msg() {
   char buf[100];
   gets(buf);
}


          |                     |
          |---------------------|
          |                     |
          |---------------------|
          |                     |
          |---------------------|
          | return address      |
          |---------------------|
          |                     |
          |---------------------|
          | ... buf[98], buf[99]|
          |        ...          |
  %esp -> | buf[0], buf[1], ... |
          |vvvvvvvvvvvvvvvvvvvvv|
          

------------------------------------------
        So how does system get called?
------------------------------------------
     AFTER THE CALL TO GET_MSG RETURNS

          |                     |
          |---------------------|
          |  argument value     |
          |---------------------|
  %esp -> | return address      |
          |vvvvvvvvvvvvvvvvvvvvv|

------------------------------------------
       What if sh_path isn't already in the program?
       What should the attack set the return address to be?
 B. return-oriented programming gadgets
------------------------------------------
     RETURN-ORIENTED PROGRAMMING

See
Ryan Roemer, Erik Buchanan, Hovav Shacham,
and Stefan Savage.
Return-Oriented Programming:
Systems, Languages, and Applications.
ACM Trans. Inf. Syst. Secur.
Vol. 15, num. 1, Article 2 (March 2012).
https://doi.org/10.1145/2133375.2133377


"can induce arbitrary computation
without injecting any code" (p. 2:3)

Shows that "preventing the introduction
of malicious code is" not "sufficient
to prevent the introduction of malicious
computation"


------------------------------------------
  1. pop-ret gadget
------------------------------------------
     POP-RET GADGET

Def: *a gadget* is


Key gadget:

    pop %eax
    ret


Call this the "pop-ret gadget"
------------------------------------------
        What does the key gadget do?
  2. using the stack pointer as an instruction pointer
------------------------------------------
   USING THE STACK POINTER AS THE IP

How would we repeat a system call N times?

Set up the stack like:

          |       ...           |
          |---------------------|
          |  address of sh_path |
          |---------------------|
          |  address of pop-ret |
          |---------------------|
          | address of system() |
          |---------------------|
          |  address of sh_path |
          |---------------------|
          |  address of pop-ret |
          |---------------------|
          | address of system() |
          |---------------------|
          |  address of sh_path |
          |---------------------|
          |  address of pop-ret |
          |---------------------|
          | address of system() |
          |---------------------|
          |                     |
          |---------------------|
          | ... buf[98], buf[99]|
          |        ...          |
  %esp -> | buf[0], buf[1], ... |
          |vvvvvvvvvvvvvvvvvvvvv|
          
------------------------------------------
        How did we make the stack look like that?
        What happends when the current function returns?
        Does this execute any code in the stack?
        In what sense is the stack pointer being used as the
        instruction pointer?
 C. guessing random stack canaries
------------------------------------------
  DEFEATING RANDOM STACK CANARIES

Some assumptions needed:
 - server has a buffer overflow
    vulnerability
 - server will crash and restart
    if the canary value is bad
 - after restart, the canary and ASLR
    is NOT rerandomized

Attack approach:
 - Probe the bytes of the canary 1 by 1
 - Use crash to tell if guess was wrong
 
------------------------------------------
        Why would the server restart without rerandomization?
  1. Probing to find the canary
------------------------------------------
        PROBING TO FIND THE CANARY

Stack (bytes shown):

          
          |          |
          |----------|
          |canary[3] |
          |----------|
          |canary[2] |
          |----------|
          |canary[1] |
          |----------|
          |canary[0] |
          |----------|
          | buf[99]  |
          |----------|
          | buf[98]  |
          |----------|
          |  ...     |
          | buf[1]   |
          |----------|
  %esp -> | buf[0]   |
          |vvvvvvvvvv|


------------------------------------------
 D. Blind Return-oriented Programming
  1. 64 bit background
------------------------------------------
    64-BIT X86 ARCHITECTURE

Arguments passed in registers
   not on the stack


E.g., write() system call takes 3 args:
     socket (in rdi)
     buffer (in rsi)
     length (in rdx)

So need the following gadgets:
 1) pop rdi; ret (socket)
 2) pop rsi; ret (buffer)
 3) pop rdx; ret (length)
 4) pop rax; ret (write syscall number)
 5) syscall

------------------------------------------
        What does that mean for the attacker?
  2. Reference for BROP
------------------------------------------
 BLIND RETURN-ORIENTED PROGRAMMING (BROP)

Blind?
   to start attacker does NOT have copy
   of the binary to disassemble

See
  A. Bittau, A. Belay, A. Mashtizadeh,
  D. Mazieres and D. Boneh,
  "Hacking Blind,"
  In 2014 IEEE Symposium on
  Security and Privacy, 2014, pp. 227-242.
  doi: 10.1109/SP.2014.22.

and https://youtu.be/xSQxaie_h1o

Assumptions:
  - server has
    a buffer overflow vulnerability
  - server restarts after crash
     without re-randomizing
     
Goal:


Approach:
  1. Find stop gadget
  2. Find gadgets that pop stack entries
  3. Determine what registers gadgets
     pop gadgets pop into
  4. Invoke write() system call
     to send program's code to attacker

------------------------------------------
     Why does the server not re-randomize after restarting?
  3. stop gadget
------------------------------------------
            STOP GADGET

A gadget that when called


How to find a stop gadget?

------------------------------------------
  4. pop gadgets
------------------------------------------
         FINDING POP GADGETS

Put on the stack
 - a return address to test (probe)
 - address of stop gadget
 - address of trap (causes crash)
   (and more traps)

Looks like:

          |         ...         |
          |---------------------|
          | addr of trap (0x0)  |
          |---------------------|
          | addr of trap (0x0)  |
          |---------------------|
          | addr of trap (0x0)  |
          |---------------------|
          | addr of trap (0x0)  |
          |---------------------|          
          | addr of stop gadget |
          |---------------------|
          | addr of trap (0x0)  |
          |---------------------|
          | addr to probe (A)   |
          |---------------------|
          |                     |
          |---------------------|

Notation:
   (probe(A), stop, trap, trap,...)

Find address P that does NOT pop stack:
   (probe(P), stop, trap, trap, trap, ...)

Find address A that pops 1 stack entry:
   (probe(A), trap, stop, trap, trap,...)
  
------------------------------------------
        What kind of address would cause a crash?
        How does the first layout (pictured) work?
        What kind of stack layout would pop exactly N stack entries?
  5. Determine what registers pop gadgets use
------------------------------------------
  FINDING WHICH GADGET POPS INTO RAX

Set up stack like:

          |                      |
          |----------------------|
          | addr of syscall      |
          |----------------------|
          |         ...          |
          |----------------------|
          | syscall no. of pause |
          |----------------------|
          | addr of pop gadget 3 |
          |----------------------|          
          | syscall no. of pause |
          |----------------------|
          | addr of pop gadget 2 |
          |----------------------|
          | syscall no. of pause |
          |----------------------|
          | addr of pop gadget 1 |
          |----------------------|
          |                      |
          |----------------------|


Note: to make a syscall:
  - put number of call desired in rax
  - then invoke syscall() from libc
------------------------------------------
       What does this do?
       How would you find which pop gadget pops into rax?
       How would you find which gadgets pop into other registers?
  6. conclusion of attack
------------------------------------------
        CONCLUDING THE ATTACK

Now can call write using the
following gadgets:

 1) pop rdi; ret (socket)
 2) pop rsi; ret (buffer address)
 3) pop rdx; ret (length)
 4) pop rax; ret (write syscall number)
 5) syscall

------------------------------------------
        How to find the socket number for the attacker's connection?
        What buffer address to use?
  7. How to defend against BROP?
------------------------------------------
        HOW TO DEFEND AGAINST BROP?

Basic idea:
 - re-randomize
      On Linux: use exec instead of fork
      Run on Windows: always uses exec

 - do bounds checking
 
------------------------------------------
        What if keep connections open instead of re-randomizing?