Our objective is to develop a hashing based Dictionary class. (The background for hash tables can be found in most books on data structures.) This class maintains a dictionary of (key, value) pairs. The class supports operations to
The basic approach for implementing a hash table is as follows. We start with some number of slots, say N, in the table. We use a function called hash function that maps any key into an integer in the range 0 to N-1. If a key hashes to a value k, then we store the (key, value) pair in the kth slot in the table.
The main problem to be addressed in hashing is that of collisions, which happens when two distinct keys hash to the same value. There are two principal techniques for dealing with collisions. The first technique, called chaining associates a linked list with each slot in the hash table. All (key, value) pairs that hash to a particular slot are all stored in the list associated with that slot. The second technique is based on a technique called open addressing, where all of the (key, value) pairs are stred in the hash table itself. When a key hashes to slot k and this slot is empty, we store the key in this slot. If the slot is already filled, we systematically examine alternative slots in the hash table to store this (key, value) pair. The sequence of slots to be examined is defined by a probe sequence h(key, 0), h(key, 1), ...., h(key, N). We start by attempting to store the key in the slot h(key, 0). If there is a collision, they we try the slot h(key, 1), h(key, 2) and so on, until we try h(key, N). For this technique to work, we need to place the restriction that for any 0 <= i < N, there is some 0 <= k < N such that h(key, i) = k. (In other words, the probe sequence is guranteed to try every slot in the table.)
One problem with both techniques is that collisions increase as the number of entries in the table approaches N. It is necessary to resize the table when it is too close to being full. On the other hand, if there have been too many deletions from the table that make it very sparse, then we need to shrink the table to free memory. However, resizing the table is an expensive operation, typically requiring us to reinsert every entry into a larger table. To keep the overheads of the resizing operation low, we can do the following. Whenever the table is more than U% full, we double the size of the table. If it falls below L%, we halve the table size. By choosing appropriate values for U and L (say, 80 and 30), we can ensure that the overhead (when spread over all insert/delete operations) is O(1).
Design your HashDictionary class using appropriate design techniques so that you can:
You can use the following technique for probing. Use the following method to generate successive probes, always ensuring that the table size is a power of 2.
h(key, 0) = h(key)
h(key, k) = (h(key, k-1) + k) mod N
(Does this technique ensure that all slots are eventually examined?)
The following are in addition to books about Java itself where you may want to read about arrays, references, and loops. For example, in The Java Programming Language by Arnold and Gosling (Addison-Wesley, 1996) see chapters 1-3, sections 5.8, 5.13, 12.2-3, and 12.6. (Also other parts of chapter 12 may be useful.) Or in Just Java by van der Linden (Prentice-Hall, 1997), read chapters 2, 4, and 7, and the section on arrays in chapter 5.
The online resources relevant to design patterns are good for understanding the design pattern concepts, but the book by Gamma et al. is the best source for specific patterns. Check the bookstore or the reserve room at the library. If you are unable to get hold of this book, use the design techniques that you believe are most suitable for solving this problem, and we will try to glean some patterns from it during our class discussions.
See the documentation for the classes java.util.Dictionary and java.util.Enumeration for protocols you might want to support in your design.
This homework, like the others in this seminar, will be open-ended and somewhat ill-defined. we'll try to give you some idea of the difficulty/effort involved in each part. Do what you have time for.
Bring a printout of your code to class, and be prepared to discuss it with other people in the class.