I. Introduction to Data Structures (HR 4.1) A. data types ------------------------ ABSTRACT DATA TYPES def: an *abstract data type* (ADT) is a type specified by: (1) a set of *abstract values*, and (2) specifications for the *operations* on the values ------------------------ What are some examples? What were the set of abstract values (the domain)? The operations? B. data structures ------------------------- SIMPLE (ATOMIC) DATA TYPES vs. STRUCTURED DATA TYPES def: an ADT is *simple* or *atomic* if it cannot be broken down into parts. examples: int, float, Account def: a *structured data type* is an ADT that can be broken down into parts. examples: Ratl, arrays in C++ ------------------------- what were the set of abstract values (the domain)? The operations? II. Overview of Structured ADTs (HR 4.2) A. example structured ADTs 1. Arrays (4.3) ---------------------------- ARRAYS Array DOMAIN: 1. set of contiguous legal index values 2. map from the index values to ItemType values (homogeneous) STRUCTURE: linear, direct access OPERATIONS: Array(int lb, int ub, ItemType init); // MODIFIES: self // POST: legal index values // are lb..ub, // && for all i in lb..ub // the map is from i to init void Store(int i, ItemType v); // PRE: i is a legal index value // MODIFIES: self // POST: the map is the map of // self, except i maps to v ItemType ValueAt(int i) const; // PRE: i is a legal index value // POST: FCTVAL == the value of // the map at index i ---------------------------- 2. Records (4.4) ------------------------------------- RECORDS Record DOMAIN: 1. set of field names (fname1, ...) 2. a map from field names to values of types T1, ... STRUCTURE: linear, direct access OPERATIONS: Create(T1 v1, T2 v2, ...); // MODIFIES: self // POST: fname1 maps to v1, ... void Store_fname1(T1 v1); // MODIFIES: self // POST: field fname1 maps to v1 T1 Value_fname1() const; // POST: FCTVAL == value at fname1 void Store_fname2(T1 v1); // MODIFIES: self // POST: field fname1 maps to v1 T2 Value_fname2() const; // POST: FCTVAL == value at fname2 ------------------------------------- 3. Lists ------------------------------------- LISTS List DOMAIN: 1. ordered sequence of values of type ItemType 2. cursor, which is a position in the list or 1 beyond the end INVARIANT: the sequence is empty --> the cursor is beyond the end ----------------------------- ---------------------------- STRUCTURE: linear, sequential access, size varies OPERATIONS: List(); // MODIFIES: self // POST: sequence of values is empty Boolean IsEmpty() const; // POST: FCTVAL == (sequence is empty) Boolean IsFull() const; // POST: FCTVAL == true if no more // items can be stored, false otherwise void Reset(); // PRE: the sequence is not empty // MODIFIES: self // POST: the cursor is at the front Boolean EndOfList() const; // POST: FCTVAL == (cursor beyond end) void Advance(); // PRE: the cursor is not beyond end // MODIFIES: self // POST: cursor points to next item ItemType CurrentItem() const; // PRE: cursor is not beyond end // POST: FCTVAL == item at cursor void InsertBefore(ItemType someItem); // PRE: another item can be stored // MODIFIES: self // POST: self has someItem at cursor // and someItem precedes item at // cursor (if any) void InsertAfter(ItemType someItem); // PRE: another item can be stored // MODIFIES: self // POST: self has someItem at cursor // and someItem follows item at // cursor (if any) void Delete(); // PRE: cursor is not beyond end // MODIFIES: self // POST: item at cursor // is not in the sequence // && cursor at next item (if any) ------------------------------------- -------------------------------- EXAMPLE CLIENT CODE FOR LISTS #include "clist.h" void DelFirst(CharList & letterGroup, char theChar) // PRE: theChar is in the sequence of // letterGroup // MODIFIES: letterGroup // POST: first occurrence of theChar // is removed from letterGroup { } -------------------------------- 4. Stacks (4.6) ------------------------------------ STACKS Stack DOMAIN: a sequence of component values of type ItemType STRUCTURE: linear, LIFO access OPERATIONS: Stack(); // MODIFIES: self // POST: the sequence is empty Boolean IsEmpty() const; // POST: FCTVAL == (sequence is empty) Boolean IsFull() const; // POST: FCTVAL == true if no more // items can be stored, false otherwise void Push(ItemType newItem); // PRE: the sequence isn't full // MODIFIES: self // POST: newItem is the first in the // the sequence, rest is self ItemType Top() const; // PRE: the sequence isn't empty // POST: FCTVAL == first item in // the sequence void Pop(); // PRE: the sequence isn't empty // MODIFIES: self // POST: the sequence is tail of // the sequence in self ------------------------------------ 5. queues (4.7) ----------------------------------- QUEUES Queue DOMAIN: a sequence of values of type ItemType STRUCTURE: linear, FIFO access OPERATIONS: Queue(); // MODIFIES: self // POST: the sequence is empty Boolean IsEmpty() const; // POST: FCTVAL == (sequence is empty) Boolean IsFull() const; // POST: FCTVAL == true if no more // items can be stored, false otherwise void Enqueue(ItemType newItem); // PRE: the sequence isn't full // MODIFIES: self // POST: newItem is the last in the // the sequence, others unchanged ItemType Front() const; // PRE: the sequence isn't empty // POST: FCTVAL == first item in // the sequence void Dequeue(); // PRE: the sequence isn't empty // MODIFIES: self // POST: the sequence is tail of // the sequence in self ----------------------------------- ----------------------------------- PROBLEM Simulate a duplex theatre waiting line, during 30 minutes before movie. Assumptions: - takes 10 seconds to serve a customer - when the queue is too long, customers go away - in 30 minutes before movie starts, customers arrive randomly with 70% chance every 10 seconds - customers choose movie A or B randomly - queue implementation full = line full Output: number of customers to movie A and B number turned away ----------------------------------- 6. Sets (4.8) ------------------------------------ SETS Set DOMAIN: a collection of values of ItemType STRUCTURE: nonlinear OPERATIONS: Set(); // MODIFIES: self // POST: the collection is empty Boolean IsEmpty() const; // POST: FCTVAL == (collection empty) Boolean IsFull() const; // POST: FCTVAL == true if no more // items can be stored, false otherwise void Insert(ItemType i); // PRE: the collection isn't full // MODIFIES: self // POST: self == self union {i} void Delete(Itemtype i); // MODIFIES: self // POST: self == self - i Boolean isElt(Itemtype i) const; // POST: FCTVAL == (i is in collection) Set Intersect( Set set2 ) const; // POST: FCTVAL == is a collection of // the elements in both self and set2 Set Union( Set set2 ) const; // POST: FCTVAL == is a collection of // the elements in either self or set2 ------------------------------------ B. summary C. linear vs. sequential access ----------------------------------- LINEAR vs. NONLINEAR TYPES def: a structured ADT is *linear* iff whenever it contains 2 or more components: - there is a unique first and last - all others have unique predecessors and successors def: a structured ADT is *non-linear* iff there is no total ordering. ----------------------------------- is time linear? space? courses in your major? D. direct vs. sequential access ----------------------------------- DIRECT vs. SEQUENTIAL ACCESS def: a linear ADT has *direct access* iff each component can be accessed in constant (O(1)) time. def: a linear ADT has *sequential access* iff accessing components takes linear (O(n)) time. ----------------------------------- is time direct access or sequential? a phone book? a library? ------------------------------------------ SUMMARY OF STRUCTURED ADTS I. Product types A. Linear 1. Direct Access a. Homogeneous components - Array b. Heterogeneous components - Record 2. Sequential Access a. General - List b. Last-in, First-out - Stack c. First-in, First-out - Queue B. Nonlinear - Set II. Sum (Coproduct) Types - Variant Records ------------------------------------------ III. algorithm efficiency and big-O notation (HR 12.1-3) A. context ---------------------------------- GOODNESS OF PROGRAMS How to measure? - correct behavior - cost to develop and maintain - resource usage ---------------------------------- ---------------------------------- What resources? - space - time ---------------------------------- The terms often used refer to "efficiency". Can you explain that? ---------------------------- def: the study of resource usage by algorithms is called *complexity theory* ------------------------------ B. big-O notation (12.1) 1. motivation ---------------------------- THREE SORTING ALGORITHMS input time in seconds size selection bubble quick 10 0.005 0.001 0.003 100 0.05 0.01 0.006 1000 0.6 1.1 0.01 10000 50. 100. 1.3 100000 5000. 10000. 16.6 1000000 199. ---------------------------- what happens if we buy a computer that is twice as fast? what happens if we code bubble sort to be 3 times as fast? 2. function dominance --------------------------- COST FUNCTIONS def: a *cost function* gives the resources used by an algorithm example: 2 actualTime(n) = n + 5*n + 100 millisecond ----------------------- ----------------------- ABSTRACTIONS - ignore time units - ignore all but most important term --------------------------- ---------------------------- FUNCTION DOMINANCE def: g asymptotically dominates f iff there are positive constants c and x0 such that c * g(x) >= f(x), for all x >= x0 2 e.g., n asymptotically dominates n*log(n) 2 2 n asymptotically dominates n +5n +100 ---------------------------- 3. estimating functions --------------------------------- ESTIMATING FUNCTIONS def: an *estimating function* is an approximation to a cost function desired characteristics: - estimate asymptotically dominates the actual cost - estimate is simple - estimate is close to the actual cost example: 2 actualTime(n) = n + 5*n + 100 2 estimate(n) = n --------------------------------- why is this better than n^2 + 5n? than n^3? --------------------------------- BIG-O NOTATION def: if f and g are nonnegative functions, then the *order of f is g* iff g asymptotically dominates f notation: f = O(g) --------------------------------- -------------------------------- examples: 2 2 n + 5*n + 100 = O(n ) 5 0.0001*n + 9999*n = 100 13 * n! + 29 * n = ------------------------------- 4. categories of running time ---------------------------------- IMPORTANT RELATIONSHIPS AND CATEGORIES OF GROWTH O(1) < O(log n) < O(n) < O(n log n) constant logrithmic linear 2 3 O(n log n) < O(n ) < O(n ) quadratic cubic polynomial 3 n n n O(n ) < O(2 ) < O(3 ) < O(n!) < O(n ) exponential ---------------------------------- 5. big-O arithmetic ------------------------------------ BIG-O ARITHMETIC Thm: Let f and g be functions, and let k be a constant. Then 1. O(k * f) = O(f) 2. O(f * g) = O(f) * O(g), and O(f / g) = O(f) / O(g) 3. f asymptotically dominates g iff O(f) >= O(g) 4. O(f + g) = max(O(f), O(g)) Examples: 2 2 O(3 * n ) = O(n ) 2 2 O(17n * 5n) = O(17n ) * O(5n) 2 = O(n ) * O(n) 3 = O(n ) 3 2 O(n ) > O(n ) 3 2 O(13n + 5n ) = ------------------------------------ C. time efficiency of control structures (12.2) ------------------------------------- ESTIMATE THE RUNNING TIME void swap(double & i, double & j) { double temp = j; j = i; i = temp; } void initialize(double arr[], int size) { for (int i = 0; i < size; i++) { arr[i] = 0.0; } } void bubblesort(double arr[], int size) { for (int i = 0; i < size; i++) { for (int j = i+1; j < size; j++) { if (arr[i] > arr[j]) { swap(arr[i], arr[j]); } } } } ------------------------------------- ------------------------------ TIME COST OF CONTROL STRUCTURES CONTROL TIME COST ESTIMATE STRUCTURE i + j O(1) i = 3; O(1) Stmt1; Stmt2 max(O(Stmt1), O(Stmt2)) if (cond) { max(O(cond), Stmt1 O(Stmt1), O(Stmt2)) } else { Stmt2 } i=1; O(n * O(Stmt1)) while (i<=n) { Stmt1; i++; } ------------------------------ ----------------------------------- FOR YOU TO DO estimate the running times of: #include int main() { char ch, maxCh; cin >> maxCh; cin >> ch; while (cin) { if (maxCh < ch) { maxCh = ch; } cin >> ch; } } #include #include int main() { int i = 1; while (i < INT_MAX / 2) { cout << i << "\n"; i *= 2; } } ------------------------------------ D. cautions (12.9) -------------------------------- SOME CAUTIONARY NOTES - big-O analysis can't capture small differences in cost - big-O analysis doesn't say much about performance on small data sets - typically 90% of time is spent in 10% of code (inner loops) --------------------------------