Week 10 Tutorial Answers
Hashing, Heaps and Tries
Resources
Hashing
- USFCA's separate chaining visualizer
- USFCA's linear probing/double hashing visualizer
- VisuAlgo's hash table visualizer
Heaps
Tries
Questions
-
Hash tables are initially empty (i.e. contain no items). This fact is represented in the table by each entry having a value of
NoItem
.Suggest suitable
NoItem
values ifItem
s are stored in the table elements and- keys are
unsigned int
s - keys are
int
s - keys are strings
How would things be different if table entries were pointers to
Item
s (i.e.(Item *)
)?Answer:
-
unsigned int
sAll of the 232 values (assuming 32-bit
int
s) are possible keys. Maybe choosing a value unlikely to be a key would work (e.g. all 1's). If we knew something about the possible key values (e.g. that 0 would never be used as a key), we could use a non-key value forNoItem
. -
int
sAll of the 232 values (assuming 32-bit
int
s) are possible keys. Similar issues and resolution as for theunsigned int
case. An extra possibility in this case: if you didn't want to allow negative values as keys, then any negative number would work asNoItem
. -
strings
This depends on how the string is stored. If it's stored in a buffer within each
Item
, then an empty string (i.e.""
) would be a suitable representation forNoItem
. If the string is stored outside theItem
(e.g. created bystrdup()
), thenNULL
would be a suitable representation forNoItem
.
If
Item
s were not stored wthin the hash table, and were referenced via an(Item *)
, thenNULL
would be a suitable representation forNoItem
, regardless of what type the keys are. - keys are
-
For a hash table of size \( N \) that uses separate chaining for collision resolution, with the chains sorted in ascending order on key value, what are the best case and worst case costs for inserting \( k = 2N \) items into the table? Assume that insertion cost is measured in terms of number of key comparisons. What will be average search cost after all the insertions are done?
Answer:
The worst case occurs when the hash function returns a constant value and the keys arrive in ascending order. In this case, there will be a single chain from the one entry in the hash table that is "hit" by the hash function. Each search of that chain will require us to go right to the end. The cost for the first insertion is \( 0 \), since we do no key comparisons, but discover that the chain is empty. On the second insertion, we do one key comparison; on the third insertion we do two key comparisons; and so on. This will give \( 0 + 1 + 2 + \dots + (k - 1) \) comparisons which is \( k \times (k - 1) / 2 \) total comparisons.
After insertion, each search requires a scan of the single chain of length \( k \), so the average search cost will be \( k / 2 \).
The best case occurs when we have a hash function that distributes the items uniformly across the table. In this case, we end up with a chain of length \( \lceil k / N \rceil \) attached to each hash table entry. In building these chains, the first entry requires no key comparisons (just a check for an empty chain). When adding the second item to each chain, only one comparison is needed, giving a total of \( k / 2 \) comparisons.
After insertion, each search requires a scan of a chain of length \( 2 \), which means half the time, one comparison will be made, and the other half of the time, two comparisons will be made. Thus, the average search cost will be \( 1.5 \).
-
Consider a very small hash table with only 11 entries, and a simple hash function \( h(x) = x\ \%\ 11 \). If we start with an empty table and insert the following values in the order shown
11 16 27 35 22 20 15 24 29 19 13
give the final state of the table for each of the following collision resolution strategies
- separate chaining, with chains in ascending order
- linear probing
- double hashing, with \( h_2(x) = (x\ \%\ 3) + 1 \)
Answer:
The hash values for each of the keys is given in the following table:
Key 11 16 27 35 22 20 15 24 29 19 13 \( h(x) \) 0 5 5 2 0 9 4 2 7 8 2 \( h_2(x) \) 3 2 1 3 2 3 1 1 3 2 2 Working through the insertions item-by-item for each case, gives:
-
Consider a hash table of size \( N = 10 \) with hash function \( h(k) = k\ \% \ 10 \) that handles collisions via linear probing. Show the result of inserting items with these keys into an initially empty table.
- 1, 2, 3, 4, 5, 6, 7, 8
- 15, 6, 20, 3, 17, 14, 33, 5, 10, 7
Answer:
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] - - - - - - - - - - initially - 1 - - - - - - - - after inserting 1 - 1 2 - - - - - - - after inserting 2 - 1 2 3 - - - - - - after inserting 3 - 1 2 3 4 - - - - - after inserting 4 - 1 2 3 4 5 - - - - after inserting 5 - 1 2 3 4 5 6 - - - after inserting 6 - 1 2 3 4 5 6 7 - - after inserting 7 - 1 2 3 4 5 6 7 8 - after inserting 8
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] - - - - - - - - - - initially - - - - - 15 - - - - after inserting 15 - - - - - 15 6 - - - after inserting 6 20 - - - - 15 6 - - - after inserting 20 20 - - 3 - 15 6 - - - after inserting 3 20 - - 3 - 15 6 17 - - after inserting 17 20 - - 3 14 15 6 17 - - after inserting 14 20 - - 3 14 15 6 17 33 - after inserting 33 20 - - 3 14 15 6 17 33 5 after inserting 5 20 10 - 3 14 15 6 17 33 5 after inserting 10 20 10 7 3 14 15 6 17 33 5 after inserting 7
-
Consider this potential hash function:
int hash(char *key, int N) { int h = 0; char *c; for (c = key; *c != '\0'; c++) { h = h + *c; } return h % N; }
How does this function convert strings to
int
s?What are the deficiencies with this function and how can it be improved?
Answer:
It simply adds up the ASCII values of all of the characters in the string.
A problem with such a simplistic approach is that anagrams all hash to the same value (e.g. "cat"/"act", "listen"/"silent"). This can be improved by incorporating the position of the character into the hash computation, e.g.
int hash(char *key, int N) { int h = 0; char *c; int i; for (c = key, i = 0; *c != '\0'; c++, i++) { h = h + (*c * i); } return h % N; }
-
Both hash tables and balanced search trees (e.g., AVL trees and 2-3-4 trees) are efficient data structures that can be used for the insertion, searching and deletion of key-value pairs. Compare the advantages and disadvantages of each.
Answer:
Some advantages and disadvantages of hash tables and balanced search trees:Advantages Disadvantages Hash tables Insertion, search and deletion are \(O(1)\) on average Worst case complexity of \(O(n)\) (but can be avoided with a good hash function)
Needs to be resized if more items are inserted
Can waste memory if very few items are inserted and most of the table goes unusedBalanced search trees Items are ordered (unlike in a hash table), so it is possible to perform queries such as finding the smallest/greatest element efficiently
Allocates memory as needed (but requires memory for pointers)Slower than hash tables (but still very efficient) - insertion search and deletion are \(O(\log n)\) in the worst case -
Using the heap-as-array ADT from lectures, show how an initially empty heap changes when the following values are inserted.
S O R T I N G I S F U N
At each step, show how the
fixUp()
function is used.Answer:
Ins [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] S S (reached root) O S O (fixUp (O ≤ S)) R S O R (fixUp (R ≤ S)) T S O R T (fixUp (T > O), swap) S T R O (fixUp (T > S), swap) T S R O (reached root) I T S R O I (fixUp (I ≤ S)) N T S R O I N (fixUp (N ≤ R)) G T S R O I N G (fixUp (G ≤ R)) I T S R O I N G I (fixUp (I ≤ O)) S T S R O I N G I S (fixUp (S > O), swap) T S R S I N G I O (fixUp (S ≤ S)) F T S R S I N G I O F (fixUp(F ≤ I)) U T S R S I N G I O F U (fixUp(U > I), swap) T S R S U N G I O F I (fixUp(U > S), swap) T U R S S N G I O F I (fixUp(U > T), swap) U T R S S N G I O F I (reached root) N U T R S S N G I O F I N (fixUp(N ≤ N)) Ins [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
-
Consider the heap-as-array ADT given in lectures:
typedef struct HeapRep { Item *items; // array of Items int nitems; // #items in array int nslots; // #elements in array } HeapRep; typedef HeapRep *Heap;
If a heap is created to hold up to 10
Item
s, andItem
s are integer values, and higher values have higher priority, show the state of the heap after each of the following operations:Heap h; Item it; h = newHeap(10); insert(h, 10); insert(h, 5); insert(h, 15); insert(h , 3); insert(h, 16); insert(h, 13); insert(h, 6); it = delete(h); insert(h, 2); it = delete(h); it = delete(h); it = delete(h); it = delete(h); it = delete(h);
Answer:
The following shows each operation and then the state of the heap after that operation. Note that since
nslots
remains as 10 for the entire lifetime of the heap, we don't show it.Operation nitems [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] h = newHeap(10); 0 - - - - - - - - - - - insert(h, 10); 1 - 10 - - - - - - - - - insert(h, 5); 2 - 10 5 - - - - - - - - insert(h, 15); 3 - 15 5 10 - - - - - - - insert(h, 3); 4 - 15 5 10 3 - - - - - - insert(h, 16); 5 - 16 15 10 3 5 - - - - - insert(h, 13); 6 - 16 15 13 3 5 10 - - - - insert(h, 6); 7 - 16 15 13 3 5 10 6 - - - it = delete(h); 6 - 15 6 13 3 5 10 - - - - insert(h, 2); 7 - 15 6 13 3 5 10 2 - - - it = delete(h); 6 - 13 6 10 3 5 2 - - - - it = delete(h); 5 - 10 6 2 3 5 - - - - - it = delete(h); 4 - 6 5 2 3 - - - - - - it = delete(h); 3 - 5 3 2 - - - - - - - it = delete(h); 2 - 3 2 - - - - - - - -
-
Repeat the above sequence of insertions and deletions, but this time draw the heap as a binary tree, with the highest value at the root and the values decreasing as you move down any branch.
Reminder:
Heap h; Item it; h = newHeap(10); insert(h, 10); insert(h, 5); insert(h, 15); insert(h , 3); insert(h, 16); insert(h, 13); insert(h, 6); it = delete(h); insert(h, 2); it = delete(h); it = delete(h); it = delete(h); it = delete(h); it = delete(h);
Answer:
It's much easier to consider the heap as a tree, e.g.
-
Consider the following trie:
Show which nodes would be visited when searching for each of the following words:
now
the
ant
access
actor
action
actors
tang
a
Answer:
now
→ 2, 6, 13the
→ 3, 8, 16ant
→ 1, 5, 12access
→ 1, 4, 10, 17, 22, 24actor
→ 1, 4, 11, 18, 23action
→ 1, 4, 11, fails (no "i")actors
→ 1, 4, 11, 18, 23, fails (leaf)tang
→ 3, 7, 15, fails (leaf)a
→ 1, fails (not red)
-
Under what circumstances would inserting a new word into a trie not result in adding any new nodes? What would be the effect on the tree of inserting the word?
Answer:
If the new word is a prefix of an existing word in the trie, then it won't require a new node to be added. The node representing the last character in the word will simply be turned into a finishing (red) node.
-
Show the final trie resulting from inserting the following words into an initially empty trie.
so boo jaws boon boot axe jaw boots sore
Does the order that the words are inserted affect the shape of the tree?
Answer:
Final trie:
The order of insertion does not affect the final shape of the tree.
-
What are the two cases when deleting a word from a trie? How is each case handled?
Answer:
-
The word ends at an internal node
The finishing (red) node is simply changed into a non-finishing (black) node
-
The word ends at a leaf node
The finishing (red) node is unlinked from its parent and removed (freed). If the parent of that node is a non-finishing leaf node, it is also removed from the tree. Repeat until we reach a finishing node or a non-leaf node.
-