CS 342 Lecture -*- Outline -*- * Garbage collection and Memory allocation strategies (what does cons do?) ** Stack vs. Heap (in Pascal) lifetime of variables tied to procedures lifetime of storage on heap is indefinite ** Garbage Cons allocates a cell (2 words) from heap. if these cells are unused they become garbage, should be reused (programmer thinks memory is infinite) *garbage: cells that are not reachable from machine registers (assume stack is reachable) = inaccessible storage draw picture of memory as digraph with root, free list, garbage pointers are distinguished from integers by a bit atoms are stored in separate part of memory, so can be distinguished from other pointers ** Language options for garbage collection *** Explicit erasure dispose primitive cell is linked into free list -programmers keep track of cells (and sharing) better to automate -security: dangling references, corruption of free list *** Automated techniques for storage reclaimation reference counting, mark-and-sweep, copying (recitation) ** Reference counting especially applies where there is no possibility of cyclic structures (applicative language) each cell contains: count of references, car, and cdr if reference count is 0, cell is garbage manipulation of reference counts inc when new reference dec when reference is destroyed (car x) 1. inc ref count of (car x) 2. dec ref count of x (cdr x) 1. inc ref count of (cdr x) 2. dec ref count of x (cons x y) 1. allocate new cell, with ref count of 1 2. increment ref count of x and y other ways to change references: assignment, destructive list operations decrementing reference count is recursive if ref count goes to zero, recurse, and free advantages: reclaims cells as soon as available most reference counts are 1, so usually free when destroy pointer allows update-in-place optimization (updates) if only one reference can be done in parallel, but must increase ref counts before doing pointer manip (if collecting garbage in parallel) disadvantages: have to deal with overflows of counts or waste lot of space (1/3 of memory devoted to reference counts?) doesn't handle cyclic structures (show example) *** Mark and Sweep wait until no more cells to allocate, then mark and sweep **** Mark phase (mark all cells that are reachable) ------ (defun mark! (cc) (cond ((atom cc) nil) ((unmarked? cc) (mark-cell! cc) (mark! (cdr cc)) (mark! (car cc))))) ------ **** Sweep phase (find garbage = unmarked cells, make into free list) memory is big array of cons cells, test each one to see if it's marked. if not marked, put in free list advantages: less overhead for short programs less space overhead (one bit in each cell) can reclaim cyclic structures disadvantages: problems for real-time systems can do garbage collection in parallel bad performance in virtual memory systems must touch all pages in heap at least those containing accessible info (keep track of what pages used during marking) little locality of reference during gc each page may be accessed many times, unless pointers rarely cross pages and related objects located together e.g., if pages are 4K (bytes), memory is 2**20 pages (32 bit addresses), page faults take 10 ms => about 2 minutes to sweep entire memory! (not to mention time for mark phase!) e.g, if pages are 4K (bytes), memory is 2**36 pages (48 bit addresses), page faults take 10 ms => about 21 years to do a sweep of entire memory! above shows performance reason for reference counting and hence applicative languages