CIS 4615 meeting -*- Outline -*- * x86 disassembly Based on chapter 4 of the book Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software by Michael Sikorski and Andrew Honig, No Starch Press, 2012. Also course notes from Golden Richard Also Intel architecture manuals, see http://www.intel.com/content/www/us/en/processors/ architectures-software-developer-manuals.html ** x86 architecture Each assembly language is specific to a particular processor and is in a 1-1 correspondence with the machine code of that processor We concentrate on the Intel x86 or IA-32 architecture this is the architecture of the Intel 80286 and 80386 used in Windows XP We will say some things about the more modern x64 architecture ------------------------------------------ ARCHITECTURAL OVERVIEW +--------------------+ +---------+ | CPU | | | | +----------------+ | | | | | Registers | | | | | +----^-----------+ | | RAM | | | | | | | +---v--+ +-----+ | | | | | ALU <-->Control<-----> | | +---^--+ +-----+ | | | +------|-------------+ | | | | | +-------v--------------+ | | | I/O Devices | | | | | +---------+ +----------------------+ ------------------------------------------ Q: What does the RAM do? stores all data and code Q: What does the ALU do? arithmetic and logical operations on data Q: What does the control unit do? fetches instructions and executes them, and tracks the instruction pointer (IP) *** memory **** real mode memory addressing ------------------------------------------ REAL MODE MEMORY ADDRESSING For 16 bit 8086 and 8088 (from 1978!) also 80286 and above, for compatibility with older programs Address is sum of: Segment address + offset FFFFFH +----------------+ | | | | +----------------+ 1F000H | |<-+ offset=F000 +----------------+ | | 64K segment | | | | | Seg. Reg. 10000H | |<\| +---------+ +----------------+ +-+ 1000 | | | +---------+ | | | | | | 00000H +---------------- ------------------------------------------ Note that the address in the segment register is only 1000H, this is extended to 10000H (shifted by 4 bits) so that the whole address is 20 bits (=16+4) long Q: Why use this segment register + offset scheme? This allows using up to 1M of RAM with 16 bit addresses Also allows relocation ------------------------------------------ MEMORY LAYOUT FOR A PROCESS Approximate, the sections may be - in different orders - not contiguous high addresses +----------------------------+ | environment ptr | | cmd line arguments | BP ->+----------------------------+ | stack | | | | | | | SP ->| v | +----------------------------+ | | SS ->| | +----------------------------+ +----------------------------+ | ^ | | | | | | | ES ->| heap | +----------------------------+ | | | .data or .bss | DS ->| | +----------------------------+ | .text | CS ->| | +----------------------------+ low | shared libraries | addresses +----------------------------+ ------------------------------------------ Q: What's in the .text section? Code **** Protected mode addressing ------------------------------------------ PROTECTED MODE ADDRESSING 80286 (16 bit, from 1982) and above Doesn't use segment + offset to directly form address Segment register contains a selector into a global descriptor table, the descriptor has a base address, address is base address + offset So instructions are the same! memory +------------+ FFFFFF | | | | | | | | global | | descriptor | | table | | +-----------+ | | | | | | | | | | | | | | | | +------------+ 1000FF DS +-----------+ | data | +-------+ | ... | /> segment | 100000 | 0008 |->| 100000 +- +------------+ +-------+ | 00FF | | | +-----------+ | | | | | | +-----------+ +------------+ 000000 ------------------------------------------ 80286 has a descriptor with a 24 bit base address, and a limit that is 16 bits 80386 and above have a 32 bit base address and a 20 bit limit In the descriptor are also seom permissions **** virtual memory ------------------------------------------ VIRTUAL MEMORY 80386 (32 bit, from 1985) and above Program generates a "linear address" Memory paging unit translates it to a "physical address" ------------------------------------------ **** x64, flat memory addressing + virtual memory ------------------------------------------ X64 ARCHITECTURE 64 bit architecture, introduced with the Intel Pentium 4 (2000) linear addressing + virtual memory only sensible to use FS and GS segment registers ------------------------------------------ *** instructions, opcodes, endianness ------------------------------------------ ASSEMBLY INSTRUCTIONS Intel assembler (NASM) conventions: mov ecx, 0x42 | | | Mnemonic destination source B9 42 00 00 00 | opcode constant in bytes Little-endian = least significant bytes first (left) Big-endian = most significant bytes first (left) ------------------------------------------ Q: Which is Intel: big or little endian? It's little-endian. *** operands ------------------------------------------ TYPES OF OPERANDS Immediate: Register: Memory address: ------------------------------------------ ... a fixed value (like 0x42) ... a register, like ecx ... the value in a memory cell, like [eax] *** registers ------------------------------------------ REGISTERS General: 64 bit: RAX, RBX, RCX, RDX 32 bit: EAX, EBX, ECX, EDX 16 bit: AX, BX, CX, DX 8 bit AH & AL, BH & BL, CH & CL, DH & DL 64 bit mode also has registers r8-r15 64 bit: r8 32 bit: r8d 16 bit: r8w 8 bit: r8b ECX is used as a Source/Destination registers: 64 bit: RSI RDI 32 bit: ESI EDI 16 bit SI DI Stack-manipulation registers 64 bit: RBP RSP 32 bit: EBP ESP 16 bit: BP SP Flags: EFLAGS Instruction Pointer: RIP (64 bit) EIP (32 bit) IP (16 bit) ------------------------------------------ ... counter by some instructions The segment registers are only used in old (286,386) software, such as DOS programs, and they are not used in protected mode (with virtual memory paging) For r8-r15, there is no "h" part, no direct access to bits 8-15 For 64 bit mode, can access lower order bits of rbp, rsi, rdi using bpl, sil, dil (bits 0-7 of rbp/ebp, rsi/esi, rdi/edi) There are other registers, e.g., for floating point, debugging, etc. ------------------------------------------ BACKWARDS COMPATIBILITY REGISTERS +--------+---------+ | AH | AL | | | | +--------+---------+ 8 bits 8 bits +------------------+ | AX | | | +------------------+ 16 bits +-----------------------------------+ | EAX | | | +-----------------------------------+ 32 bits +-//------------------------------------+ | RAX | | | +-//------------------------------------+ 64 bits ------------------------------------------ The different names for parts of registers are mostly for backwards compatibility, but they give one the ability to manipulate specific bytes in the registers. The high byte registers (AH, BH, ...) are not available in 64 bit mode ------------------------------------------ DATA TYPES AND BITS 7 0 +-----+ byte | | +-----+ 15 8 7 0 +-----+-----+ word | high| low | | byte| byte| +-----+-----+ N+1 N 31 16 15 0 +--------+-----------+ doubleword | | | | | | +--------+-----------+ N+2 N 63 32 31 0 +------------------+--------------------+ | high | low | | doubleword | doubleword | +------------------+--------------------+ N+4 N ------------------------------------------ N is an address (or array index) in bytes ------------------------------------------ SEGMENTATION For protected mode (32 bit)in x86, 3 kinds of address: 1. segmentation-based: segment + offset 2. linear/virtual address (32/64 bit address) 3. physical address (32/64 bit address Segment registers (16 bit) CS ~ code SS ~ stack DS ~ data ES ~ extra FS ~ exception handling chain GS Segmentation is disabled in 64 bit mode - uses flat 64 bit address space ------------------------------------------ Q: How much memory can be addressed with a 16 bit address? 65 KB this is why there are segments Segment register value (16 bits) + 16 bit offset allowed 20 bit addressing, so up to 1 MB of RAM Q: How much memory can be addressed with a 32 bit address? 4 GB The following is from http://www.eecg.toronto.edu/~amza/www.mindsec.com/files/x86regs.html ------------------------------------------ EFLAGS REGISTER Bit Name Description ==================================== 0 CF Carry flag 2 PF Parity flag 4 AF Auxiliary carry flag 6 ZF Zero flag 7 SF Sign flag 8 TF Trap flag 9 IF Interrupt enable flag 10 DF Direction flag 11 OF Overflow flag 12-13 IOPL I/O Privilege level 14 NT Nested task flag 16 RF Resume flag 17 VM Virtual 8086 mode flag 18 AC Alignment check flag (486+) 19 VIF Virtual interrupt flag 20 VIP Virtual interrupt pending flag 21 ID ID flag ------------------------------------------ *** instructions **** overview ------------------------------------------ INSTRUCTION FORMAT NASM syntax (Intel variant) Instruction format LABEL: OPCODE destop [, sourceop] [; comment] Example: HERE: cmp ebx, BEEFh ; does ebx have secret? push ebx push eax xor ebx, ebx ; ebx == 0 xor eax, eax ; eax == 0 ------------------------------------------ instructions are on single lines (unless continued by \) The comment is from the semicolon (;) to the end of the line ------------------------------------------ ACCESSING MEMORY CONTENTS mov eax, 0xBEEF ; eax gets mov eax, [0xBEEF] ; eax gets ------------------------------------------ ... the hex value BEEF ... the value stored at address BEEF ------------------------------------------ ACCESSING VIA REGISTERS mov eax, ebx ; eax gets mov eax, [ebx] ; eax gets ------------------------------------------ ... the value in ebx ... the value stored at the address in ebx (indirect) **** details ------------------------------------------ GRAMMAR CONVENTIONS ::= means "can be" or "produces" | means "or" is the nonterminal named "x", a syntactic category other literal characters [] is an optional []... means 0 or more s '[' is a left square bracket (char) ']' is a right square bracket (char) ------------------------------------------ These are a type extended BNF (context-free) grammar You may have seen ::= as an arrow (-->) in other books ------------------------------------------ MASM SYNTAX DETAILS ::= [