I. Memory Corruption Attacks A. attacker goals ------------------------------------------ MEMORY ATTACK GOALS Goals of attacker: Attack examples: - Buffer overflows - Integer overflows - Format string attacks ------------------------------------------ B. Buffer overflow attacks 1. background ------------------------------------------ BACKGROUND FOR BUFFER OVERFLOW ATTACKS - C calling conventions with stack/heap layout - How addressing in C works ------------------------------------------ Why do we focus on C for this? Is there a way for an attacker to find out what machine/OS is running a program like a web server? ------------------------------------------ LINUX PROCESS MEMORY LAYOUT 0xC0000000 |------------------| | user stack | |vvvvvvvvvvvvvvvvvv| <- %esp | | | | |^^^^^^^^^^^^^^^^^^| | shared libraries | 0x40000000 |------------------| | | |^^^^^^^^^^^^^^^^^^| <- brk | runtime | | heap | |------------------| loaded | | by exec() 0x08048000 |------------------| | | unused 0 |------------------| ------------------------------------------ 2. how attacks work in C ------------------------------------------ BUFFER OVERFLOW ATTACK Example C (or C++) function void f(char *strarg) { char buf[100]; strcpy(buf, strarg); /* ... */ } Stack frame of f when called: |------------------| | strarg | |------------------| | return address | |------------------| | old esp | |------------------| | buf[99] | | | | ... buf ... | | | | buf[1] | | buf[0] | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ What happens if strarg has 108 characters before the null char? ------------------------------------------ BUFFER OVERFLOW AFTER STRCPY Example C (or C++) function void f(char *strarg) { char buf[100]; strcpy(buf, strarg); /* ... */ } Stack frame of f after strcpy |------------------| | strarg | |------------------| | | |------------------| | | |------------------| | buf[99] | | | | ... buf ... | | | | buf[1] | | buf[0] | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ What is now in the place where the return address was? How could an attacker exploit that? Is there anything special about the size of 100? Will the attack work if the function f crashes? Why doesn't the OS stop this attack? ------------------------------------------ BUFFER OVERFLOW ATTACK STEPS a. overwrite the stack so that code to execute is at location P b. overwrite the return address to be c. when call returns, calls P ------------------------------------------ Is the new code to execute in the heap or the stack? What does the attacker need to know? ------------------------------------------ THE NOP SLIDE Where should attacker's code start? | ... | | attacker's code | | start of P | |------------------| | ... | | nop instructions | |==================| | | |------------------| | return address | |------------------| | | |------------------| | | | ... buf ... | | | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ How does the attacker the nop instructions and P on the stack? Will the attacker's code work if the return address points into the list of nop instructions? Will the attacker's code work if it contains a byte of zero (0)? 3. background, why C? ------------------------------------------ WHY NOT USE A MEMORY-SAFE LANGUAGE? Why not use a memory-safe language like Java, Python, C#, Rust, ... ------------------------------------------ Why not use a memory-safe language? 4. background, unsafe libc functions ------------------------------------------ UNSAFE FUNCTIONS IN LIBC strcpy(char *dest, char *src) strcat(char *dest, char *src) gets(char *s) scanf(char *format, ...) ... ------------------------------------------ Why can't these functions in libc check sizes? How could such C functions be made safer? ------------------------------------------ SAFER FUNCTIONS IN LIBC strncpy(char *dest, char *src, size_t n) strncat(char *dest, char *src, size_t n) fgets(char *s, size_t n, FILE *stream) with scanf, use width specifier to read strings e.g., %100s ------------------------------------------ 5. background: other possibilities for buffer overflows a. SEH overwrite attack ------------------------------------------ STRUCTURED EXCEPTION HANDLER OVERWRITES Structured Exception Handler (SEH) - exception dispatch in Windows - registration record on stack contains: - pointer to handler - pointer to next record ------------------------------------------ When are SEHs used in Windows? How could such an attack be prevented? b. Function pointer overwriting What would happen if there is a data structure with a function pointer above an array in C or C++? ------------------------------------------ FUNCTION POINTERS IN DATA Is this safe? |------------------| | function pointer | |------------------| | | | ... buf ... | | | |------------------| ------------------------------------------ c. Setjmp buffer overwriting ------------------------------------------ BACKGROUND: SETJMP and LONGJMP in C setjmp() saves - stack pointer (sp), - frame pointer (fp), and - program counter (pc). in an array longjmp() restores these registers So, what does the following do? #include jmp_buf env; setjmp(env); label1: ; /* ... */ longjmp(env, 2); ------------------------------------------ ------------------------------------------ SETJMP and LONGJMP in C /* file setjmp.h ... */ #define _JBLEN 9 typedef struct{ int _jb[_JBLEN + 1]; } jmp_buf[1]; Consider: #include jmp_buf handler; char msg[100]; void set_handler(char *s, fptr rest) { int i = setjmp(saved); /* ... */ strcpy(msg, s); *rest(); } void call_handler() { longjmp(handler, 2); } Eventually program does: read_from_user(s); set_handler(s); /* ... */ call_handler(); ------------------------------------------ What could go wrong in that code? II. Memory Corruption Defense A. Problem and approaches ------------------------------------------ HOW TO PREVENT BUFFER OVERFLOW ATTACKS? ------------------------------------------ How could buffer overflows be prevented in C/C++? B. Stack Canaries ------------------------------------------ STACK CANARIES Goal: catch overwrite before |------------------| | strarg | |------------------| | return address | |------------------| | old esp | |------------------| | canary value | |------------------| | buf[99] | | | | ... buf ... | | | | buf[1] | | buf[0] | |vvvvvvvvvvvvvvvvvv| <- %esp ------------------------------------------ What would be a good value for the canary? ------------------------------------------ WHEN DO STACK CANARIES FAIL? When attacker: ------------------------------------------ C. Electric Fences ------------------------------------------ ELECTRIC FENCES Every heap allocation between pages [ Guard Page ] [ Heap Alloc.] [ Guard Page ] Guard pages are protected by hardware Write (or read) from guard pages causes an OS trap ------------------------------------------ What's the cost of this? D. Bounds Checking 1. goals and costs ------------------------------------------ BOUNDS CHECKING Goal: What does it mean in C? ------------------------------------------ What does it mean for a pointer in C to be "in bounds"? Can you enforce bounds in C without help from the compiler? ------------------------------------------ WHAT HAS TO BE DONE Interject code whenever program does: - pointer arithmetic p + 1 - pointer dereference ------------------------------------------ In C, why can't we give an error when forming a pointer past the end of a bound? In C, why can't we just give an error when dereferencing a pointer? ------------------------------------------ PRACTICAL CONSIDERATIONS How expensive are defenses? Would people pay for an OS that is: ------------------------------------------ 2. fat pointers ------------------------------------------ FAT POINTERS Instead of just an address, each pointer stores: ------------------------------------------ What's the overhead of this? Will this interoperate with old code? Does using fat pointers need compiler support? Can fat pointers be updated atomically? 3. overview ------------------------------------------ BOUNDS CHECKING TECHNIQUES Splay trees: - record sizes for each object - grow larger as objects allocated Shadow table: - memory divided into constant-sized slots - pointers converted to indexes into shadow table - can locate metadata in ------------------------------------------ What resources will limit shadow table precision? 4. Baggy Bounds checking a. for 32 bit systems ------------------------------------------ BAGGY BOUNDS CHECKING Goals: - detect and stop out of bounds accesses - ensure derived pointers point to same object - don't allow reading/writing other allocations Constraints - use same size pointers - permit pointers one past either end Ideas: - round up each allocation to nearest power of 2 (bytes, say) e.g., 9 -> 16, 28 -> 32, ... - enforce bounds of the allocation - store binary log of allocation in bounds table, bt - allocate memory with granularity of a slot (fixed size, say 16 bytes) Let size be the allocation size (in bytes) e = log_2(size) so 1 << e = 2^e = size So, in bounds table, bt, store e value recover the size by look up in bt ------------------------------------------ How do we know the index into bt to look up the size? ------------------------------------------ RECOVERING LENGTH AND BASE FROM POINTER Find the allocated size from pointer p: len(p) = 1 << bt[p >> log_2(slot_size)] Example: allocate 32 bytes char *p = malloc(32); for slot_size = 16 log_2(slot_size) = 4 len(p) = 1 << bt[p>>4] would put 5 (i.e., log_2(32)) in bt entries for 16 byte slots: bt[p>>4] = 5 bt[(p>>4)+1] = 5 Find starting address from pointer p: base(p) = p & ~(len(p)-1) Example: int a[100]; int *pa = &a[0]; allocate sizeof(int)*100 rounded up to power of 2 bytes = 512 bytes so need 512/16 == 32 slots each bt entry stores log_2(512) = 9 suppose address of pa == 0x400 so use 32 bt entries: pa >> 4 = 0x40 (== 64) bt[pa>>4] = 9 bt[(pa>>4)+1] = 9 ... bt[(pa>>4)+31] = 9 suppose: int *pb = &a[75]; == 0x400 + 4*0x4B == 0x400 + 0x12C == 0x52C (== 1324) len(pb) = 1 << bt[pb>>4] = 1 << bt[0x52C>>4] = 1 << bt[0x52] = 1 << 9 = 0x200 (== 512) base(pb) = pb & ~(len(pb)-1) = 0x52C & ~(1FF) = 0x400 ------------------------------------------ What is 1 << s in normal mathematical terms? Do we need to align the allocations? Why do we allocate 512 bytes for the array a (of 100 ints)? Why do we need 32 entries in bt for pa (or a)? ------------------------------------------ BAGGY BOUNDS CHECKING (32 bit Arch.) inBounds(p) = check(p,base(p),len(p)) check(p,begin,end) = begin <= p and p < begin+end Result of pointer arithmetic: for computation of p2 = p + n; add the code: p2 = result(p2,p); where result(p2,p) = let begin = base(p) end = len(p) in if check(p2,begin,end) then p2 else if begin-p2 < 8 and p2 < begin+end+8 then set_high_bit(p2) else error("illegal pointer") ------------------------------------------ Why set high-order bit of out-of-bounds pointers to 1? Why allow 1/2 a slot under or over? How do we find the correct base and length for illegal pointers? Why not allow larger than 1/2 a slot under or over bounds? Do we need any other special treatment of illegal pointers? Does code need to check bounds of dereferences? ------------------------------------------ WORKING WITH UNINSTRUMENTED CODE Code in libraries that was not compiled with baggy bounds checks bt entries set to maximum possible (31) ------------------------------------------ Will there ever be dereference errors for pointers to space allocated by library code? Can an out-of-bounds pointer be passed to uninstumented code? b. for 64 bit systems ------------------------------------------ BAGGY BOUNDS WITH 64 BIT POINTERS Idea: Store more information in the pointers Valid pointer: [ zeros | log2(size) | address ] 21 bits 5 bits 38 bits Invalid pointer: [ offset | log2(size) | zeros | address ] 13 bits 5 bits 8 bits 38 bits ------------------------------------------ What are the advantages of putting this information in the pointers? c. evaluation ------------------------------------------ EVALUATING BAGGY BOUNDS SYSTEM Disadvantages: Advantages: ------------------------------------------ Can still have memory attacks with this system? E. Non-executable Memory ------------------------------------------ NON-EXECUTABLE MEMORY 3 bits for permission on a page: read (R), write (W), execute (X) Execute means that Policy: W xor X so can't both write and execute in a page ------------------------------------------ What kind of program would need to both write and execute code? F. Randomized Address Space (ASLR) ------------------------------------------ RANDOMIZED ADDRESS SPACE Idea: - make it harder for attacker to Often called ASLR = Address Space Layout Randomization ------------------------------------------ ------------------------------------------ DEFEATING ASLR Approaches: - extract the randomness - heap attack: . spread shell code all over the heap . jump to a random address - nop slide . put lots of nops in heap . put shell code at end . jump to random address ------------------------------------------ G. practice ------------------------------------------ WHAT IS USED IN PRACTICE? Both gcc and Visual Studio: - use stack canaries by default Linux and Windows both have W xor X memory and ASLR WHAT IS NOT USED IN PRACTICE? Baggy bounds ------------------------------------------ Why would systems not use baggy bounds? III. Return-oriented Programming ------------------------------------------ HOW ATTACKERS CAN DEFEAT ASLR AND W xor X ASLR - hard to know address of attacker's code W xor X protection - can't execute shell code put on stack Both are widely used in Linux and Windows But there is a kind of attack that could defeat both... ------------------------------------------ The combination of ASLR and W xor X protection is powerful, but could there still be a way to: (a) do arbitrary computations, and (b) not need to know addresses ahead of time? ------------------------------------------ RETURN-ORIENTED PROGRAMMING Characteristics: - only needs to use return addresses on stack, so no code executed in stack - doesn't need to know fixed addresses Idea: ------------------------------------------ A. examples to motivate the idea 1. simplest motivating example ------------------------------------------ MOTIVATING EXAMPLE C program: void run_shell() { system("/bin/sh"); } void get_msg() { char buf[100]; gets(buf); } ------------------------------------------ How could attacker call run_shell? ------------------------------------------ EXPLAINING THE MOTIVATING EXAMPLE | | |---------------------| | return address | |---------------------| | | |---------------------| | ... buf[98], buf[99]| | ... | %esp -> | buf[0], buf[1], ... | |vvvvvvvvvvvvvvvvvvvvv| ------------------------------------------ 2. slightly more difficult example ------------------------------------------ SLIGHTLY MORE DIFFICULT EXAMPLE C program: char sh_path = "/bin/sh"; void list_dir() { system("/bin/ls"); } void get_msg() { char buf[100]; gets(buf); } ------------------------------------------ How could this program be attacked to run the shell? ------------------------------------------ FAKING A CALL TO SYSTEM Want to make a stack frame like a call: | | |---------------------| | argument value | |---------------------| %esp -> | return address | |vvvvvvvvvvvvvvvvvvvvv| ------------------------------------------ ------------------------------------------ HOW TO CREATE A CALL FRAME char sh_path = "/bin/sh"; void list_dir() { system("/bin/ls"); } void get_msg() { char buf[100]; gets(buf); } | | |---------------------| | | |---------------------| | | |---------------------| | return address | |---------------------| | | |---------------------| | ... buf[98], buf[99]| | ... | %esp -> | buf[0], buf[1], ... | |vvvvvvvvvvvvvvvvvvvvv| ------------------------------------------ So how does system get called? ------------------------------------------ AFTER THE CALL TO GET_MSG RETURNS | | |---------------------| | argument value | |---------------------| %esp -> | return address | |vvvvvvvvvvvvvvvvvvvvv| ------------------------------------------ What if sh_path isn't already in the program? What should the attack set the return address to be? B. return-oriented programming gadgets ------------------------------------------ RETURN-ORIENTED PROGRAMMING See Ryan Roemer, Erik Buchanan, Hovav Shacham, and Stefan Savage. Return-Oriented Programming: Systems, Languages, and Applications. ACM Trans. Inf. Syst. Secur. Vol. 15, num. 1, Article 2 (March 2012). https://doi.org/10.1145/2133375.2133377 "can induce arbitrary computation without injecting any code" (p. 2:3) Shows that "preventing the introduction of malicious code is" not "sufficient to prevent the introduction of malicious computation" ------------------------------------------ 1. pop-ret gadget ------------------------------------------ POP-RET GADGET Def: *a gadget* is Key gadget: pop %eax ret Call this the "pop-ret gadget" ------------------------------------------ What does the key gadget do? 2. using the stack pointer as an instruction pointer ------------------------------------------ USING THE STACK POINTER AS THE IP How would we repeat a system call N times? Set up the stack like: | ... | |---------------------| | address of sh_path | |---------------------| | address of pop-ret | |---------------------| | address of system() | |---------------------| | address of sh_path | |---------------------| | address of pop-ret | |---------------------| | address of system() | |---------------------| | address of sh_path | |---------------------| | address of pop-ret | |---------------------| | address of system() | |---------------------| | | |---------------------| | ... buf[98], buf[99]| | ... | %esp -> | buf[0], buf[1], ... | |vvvvvvvvvvvvvvvvvvvvv| ------------------------------------------ How did we make the stack look like that? What happends when the current function returns? Does this execute any code in the stack? In what sense is the stack pointer being used as the instruction pointer? C. guessing random stack canaries ------------------------------------------ DEFEATING RANDOM STACK CANARIES Some assumptions needed: - server has a buffer overflow vulnerability - server will crash and restart if the canary value is bad - after restart, the canary and ASLR is NOT rerandomized Attack approach: - Probe the bytes of the canary 1 by 1 - Use crash to tell if guess was wrong ------------------------------------------ Why would the server restart without rerandomization? 1. Probing to find the canary ------------------------------------------ PROBING TO FIND THE CANARY Stack (bytes shown): | | |----------| |canary[3] | |----------| |canary[2] | |----------| |canary[1] | |----------| |canary[0] | |----------| | buf[99] | |----------| | buf[98] | |----------| | ... | | buf[1] | |----------| %esp -> | buf[0] | |vvvvvvvvvv| ------------------------------------------ D. Blind Return-oriented Programming 1. 64 bit background ------------------------------------------ 64-BIT X86 ARCHITECTURE Arguments passed in registers not on the stack E.g., write() system call takes 3 args: socket (in rdi) buffer (in rsi) length (in rdx) So need the following gadgets: 1) pop rdi; ret (socket) 2) pop rsi; ret (buffer) 3) pop rdx; ret (length) 4) pop rax; ret (write syscall number) 5) syscall ------------------------------------------ What does that mean for the attacker? 2. Reference for BROP ------------------------------------------ BLIND RETURN-ORIENTED PROGRAMMING (BROP) Blind? to start attacker does NOT have copy of the binary to disassemble See A. Bittau, A. Belay, A. Mashtizadeh, D. Mazieres and D. Boneh, "Hacking Blind," In 2014 IEEE Symposium on Security and Privacy, 2014, pp. 227-242. doi: 10.1109/SP.2014.22. and https://youtu.be/xSQxaie_h1o Assumptions: - server has a buffer overflow vulnerability - server restarts after crash without re-randomizing Goal: Approach: 1. Find stop gadget 2. Find gadgets that pop stack entries 3. Determine what registers gadgets pop gadgets pop into 4. Invoke write() system call to send program's code to attacker ------------------------------------------ Why does the server not re-randomize after restarting? 3. stop gadget ------------------------------------------ STOP GADGET A gadget that when called How to find a stop gadget? ------------------------------------------ 4. pop gadgets ------------------------------------------ FINDING POP GADGETS Put on the stack - a return address to test (probe) - address of stop gadget - address of trap (causes crash) (and more traps) Looks like: | ... | |---------------------| | addr of trap (0x0) | |---------------------| | addr of trap (0x0) | |---------------------| | addr of trap (0x0) | |---------------------| | addr of trap (0x0) | |---------------------| | addr of stop gadget | |---------------------| | addr of trap (0x0) | |---------------------| | addr to probe (A) | |---------------------| | | |---------------------| Notation: (probe(A), stop, trap, trap,...) Find address P that does NOT pop stack: (probe(P), stop, trap, trap, trap, ...) Find address A that pops 1 stack entry: (probe(A), trap, stop, trap, trap,...) ------------------------------------------ What kind of address would cause a crash? How does the first layout (pictured) work? What kind of stack layout would pop exactly N stack entries? 5. Determine what registers pop gadgets use ------------------------------------------ FINDING WHICH GADGET POPS INTO RAX Set up stack like: | | |----------------------| | addr of syscall | |----------------------| | ... | |----------------------| | syscall no. of pause | |----------------------| | addr of pop gadget 3 | |----------------------| | syscall no. of pause | |----------------------| | addr of pop gadget 2 | |----------------------| | syscall no. of pause | |----------------------| | addr of pop gadget 1 | |----------------------| | | |----------------------| Note: to make a syscall: - put number of call desired in rax - then invoke syscall() from libc ------------------------------------------ What does this do? How would you find which pop gadget pops into rax? How would you find which gadgets pop into other registers? 6. conclusion of attack ------------------------------------------ CONCLUDING THE ATTACK Now can call write using the following gadgets: 1) pop rdi; ret (socket) 2) pop rsi; ret (buffer address) 3) pop rdx; ret (length) 4) pop rax; ret (write syscall number) 5) syscall ------------------------------------------ How to find the socket number for the attacker's connection? What buffer address to use? 7. How to defend against BROP? ------------------------------------------ HOW TO DEFEND AGAINST BROP? Basic idea: - re-randomize On Linux: use exec instead of fork Run on Windows: always uses exec - do bounds checking ------------------------------------------ What if keep connections open instead of re-randomizing?