Week 10 Tutorial Answers

Hashing, Heaps and Tries

Resources

Hashing

Heaps

Tries

USFCA's trie visualizer

Questions

Hashing

Hash tables are initially empty (i.e. contain no items). This fact is represented in the table by each entry having a value of NoItem.

Suggest suitable NoItem values if Items are stored in the table elements and
1. keys are unsigned ints
2. keys are ints
3. keys are strings
How would things be different if table entries were pointers to Items (i.e. (Item *))?
Answer:
1. unsigned ints
  
  All of the 2³² values (assuming 32-bit ints) are possible keys. Maybe choosing a value unlikely to be a key would work (e.g. all 1's). If we knew something about the possible key values (e.g. that 0 would never be used as a key), we could use a non-key value for NoItem.
2. ints
  
  All of the 2³² values (assuming 32-bit ints) are possible keys. Similar issues and resolution as for the unsigned int case. An extra possibility in this case: if you didn't want to allow negative values as keys, then any negative number would work as NoItem.
3. strings
  
  This depends on how the string is stored. If it's stored in a buffer within each Item, then an empty string (i.e. "") would be a suitable representation for NoItem. If the string is stored outside the Item (e.g. created by strdup()), then NULL would be a suitable representation for NoItem.
If Items were not stored wthin the hash table, and were referenced via an (Item *), then NULL would be a suitable representation for NoItem, regardless of what type the keys are.
For a hash table of size \( N \) that uses separate chaining for collision resolution, with the chains sorted in ascending order on key value, what are the best case and worst case costs for inserting \( k = 2N \) items into the table? Assume that insertion cost is measured in terms of number of key comparisons. What will be average search cost after all the insertions are done?

Answer:

The worst case occurs when the hash function returns a constant value and the keys arrive in ascending order. In this case, there will be a single chain from the one entry in the hash table that is "hit" by the hash function. Each search of that chain will require us to go right to the end. The cost for the first insertion is \( 0 \), since we do no key comparisons, but discover that the chain is empty. On the second insertion, we do one key comparison; on the third insertion we do two key comparisons; and so on. This will give \( 0 + 1 + 2 + \dots + (k - 1) \) comparisons which is \( k \times (k - 1) / 2 \) total comparisons.

After insertion, each search requires a scan of the single chain of length \( k \), so the average search cost will be \( k / 2 \).

The best case occurs when we have a hash function that distributes the items uniformly across the table. In this case, we end up with a chain of length \( \lceil k / N \rceil \) attached to each hash table entry. In building these chains, the first entry requires no key comparisons (just a check for an empty chain). When adding the second item to each chain, only one comparison is needed, giving a total of \( k / 2 \) comparisons.

After insertion, each search requires a scan of a chain of length \( 2 \), which means half the time, one comparison will be made, and the other half of the time, two comparisons will be made. Thus, the average search cost will be \( 1.5 \).
Consider a very small hash table with only 11 entries, and a simple hash function \( h(x) = x\ \%\ 11 \). If we start with an empty table and insert the following values in the order shown
```
11  16  27  35  22  20  15  24  29  19  13
```
give the final state of the table for each of the following collision resolution strategies
1. separate chaining, with chains in ascending order
2. linear probing
3. double hashing, with \( h_2(x) = (x\ \%\ 3) + 1 \)
Answer:

The hash values for each of the keys is given in the following table:

Key 11 16 27 35 22 20 15 24 29 19 13

\( h(x) \) 0 5 5 2 0 9 4 2 7 8 2

\( h_2(x) \) 3 2 1 3 2 3 1 1 3 2 2

Working through the insertions item-by-item for each case, gives:

Consider a hash table of size \( N = 10 \) with hash function \( h(k) = k\ \% \ 10 \) that handles collisions via linear probing. Show the result of inserting items with these keys into an initially empty table.

1, 2, 3, 4, 5, 6, 7, 8
15, 6, 20, 3, 17, 14, 33, 5, 10, 7

Answer:

[0]  [1]  [2]  [3]  [4]  [5]  [6]  [7]  [8]  [9]
 -    -    -    -    -    -    -    -    -    -   initially
 -    1    -    -    -    -    -    -    -    -   after inserting 1
 -    1    2    -    -    -    -    -    -    -   after inserting 2
 -    1    2    3    -    -    -    -    -    -   after inserting 3
 -    1    2    3    4    -    -    -    -    -   after inserting 4
 -    1    2    3    4    5    -    -    -    -   after inserting 5
 -    1    2    3    4    5    6    -    -    -   after inserting 6
 -    1    2    3    4    5    6    7    -    -   after inserting 7
 -    1    2    3    4    5    6    7    8    -   after inserting 8

[0]  [1]  [2]  [3]  [4]  [5]  [6]  [7]  [8]  [9]
 -    -    -    -    -    -    -    -    -    -   initially
 -    -    -    -    -    15   -    -    -    -   after inserting 15
 -    -    -    -    -    15   6    -    -    -   after inserting  6
 20   -    -    -    -    15   6    -    -    -   after inserting 20
 20   -    -    3    -    15   6    -    -    -   after inserting  3
 20   -    -    3    -    15   6    17   -    -   after inserting 17
 20   -    -    3    14   15   6    17   -    -   after inserting 14
 20   -    -    3    14   15   6    17   33   -   after inserting 33
 20   -    -    3    14   15   6    17   33   5   after inserting  5
 20   10   -    3    14   15   6    17   33   5   after inserting 10
 20   10   7    3    14   15   6    17   33   5   after inserting  7

Consider this potential hash function:
```
int hash(char *key, int N) {
	int h = 0;
	char *c;
	for (c = key; *c != '\0'; c++) {
		h = h + *c;
	}
	return h % N;
}
```
How does this function convert strings to ints?

What are the deficiencies with this function and how can it be improved?
Answer:
It simply adds up the ASCII values of all of the characters in the string.

A problem with such a simplistic approach is that anagrams all hash to the same value (e.g. "cat"/"act", "listen"/"silent"). This can be improved by incorporating the position of the character into the hash computation, e.g.
```
int hash(char *key, int N) {
    int h = 0;
    char *c; int i;
    for (c = key, i = 0; *c != '\0'; c++, i++) {
        h = h + (*c * i);
    }
    return h % N;
}
```

Both hash tables and balanced search trees (e.g., AVL trees and 2-3-4 trees) are efficient data structures that can be used for the insertion, searching and deletion of key-value pairs. Compare the advantages and disadvantages of each.

Answer:

Some advantages and disadvantages of hash tables and balanced search trees:

	Advantages	Disadvantages
Hash tables	Insertion, search and deletion are \(O(1)\) on average	Worst case complexity of \(O(n)\) (but can be avoided with a good hash function) Needs to be resized if more items are inserted Can waste memory if very few items are inserted and most of the table goes unused
Balanced search trees	Items are ordered (unlike in a hash table), so it is possible to perform queries such as finding the smallest/greatest element efficiently Allocates memory as needed (but requires memory for pointers)	Slower than hash tables (but still very efficient) - insertion search and deletion are \(O(\log n)\) in the worst case

Heaps

Using the heap-as-array ADT from lectures, show how an initially empty heap changes when the following values are inserted.

S  O  R  T  I  N  G  I  S  F  U  N

At each step, show how the fixUp() function is used.

Answer:

Ins [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

S        S   (reached root)

O        S   O   (fixUp (O ≤ S))

R        S   O   R   (fixUp (R ≤ S))

T        S   O   R   T   (fixUp (T > O), swap)
         S   T   R   O   (fixUp (T > S), swap)
         T   S   R   O   (reached root)

I        T   S   R   O   I   (fixUp (I ≤ S))

N        T   S   R   O   I   N   (fixUp (N ≤ R))

G        T   S   R   O   I   N   G   (fixUp (G ≤ R))

I        T   S   R   O   I   N   G   I   (fixUp (I ≤ O))

S        T   S   R   O   I   N   G   I   S   (fixUp (S > O), swap)
         T   S   R   S   I   N   G   I   O   (fixUp (S ≤ S))

F        T   S   R   S   I   N   G   I   O   F   (fixUp(F ≤ I))

U        T   S   R   S   I   N   G   I   O   F    U   (fixUp(U > I), swap)
         T   S   R   S   U   N   G   I   O   F    I   (fixUp(U > S), swap)
         T   U   R   S   S   N   G   I   O   F    I   (fixUp(U > T), swap)
         U   T   R   S   S   N   G   I   O   F    I   (reached root)

N        U   T   R   S   S   N   G   I   O   F    I    N   (fixUp(N ≤ N))

Ins [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Consider the heap-as-array ADT given in lectures:

typedef struct HeapRep {
	Item *items;   // array of Items
	int   nitems;  // #items in array
	int   nslots;  // #elements in array
} HeapRep;

typedef HeapRep *Heap;

If a heap is created to hold up to 10 Items, and Items are integer values, and higher values have higher priority, show the state of the heap after each of the following operations:

Heap h;
Item it;
h = newHeap(10);

insert(h, 10);
insert(h,  5);
insert(h, 15);
insert(h , 3);
insert(h, 16);
insert(h, 13);
insert(h,  6);
it = delete(h);
insert(h,  2);
it = delete(h);
it = delete(h);
it = delete(h);
it = delete(h);
it = delete(h);

Answer:

The following shows each operation and then the state of the heap after that operation. Note that since nslots remains as 10 for the entire lifetime of the heap, we don't show it.

Operation        nitems    [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
h = newHeap(10);      0     -   -   -   -   -   -   -   -   -   -   -
insert(h, 10);        1     -  10   -   -   -   -   -   -   -   -   -
insert(h,  5);        2     -  10   5   -   -   -   -   -   -   -   -
insert(h, 15);        3     -  15   5  10   -   -   -   -   -   -   -
insert(h,  3);        4     -  15   5  10   3   -   -   -   -   -   -
insert(h, 16);        5     -  16  15  10   3   5   -   -   -   -   -
insert(h, 13);        6     -  16  15  13   3   5  10   -   -   -   -
insert(h,  6);        7     -  16  15  13   3   5  10   6   -   -   -
it = delete(h);       6     -  15   6  13   3   5  10   -   -   -   -
insert(h,  2);        7     -  15   6  13   3   5  10   2   -   -   -
it = delete(h);       6     -  13   6  10   3   5   2   -   -   -   -
it = delete(h);       5     -  10   6   2   3   5   -   -   -   -   -
it = delete(h);       4     -   6   5   2   3   -   -   -   -   -   -
it = delete(h);       3     -   5   3   2   -   -   -   -   -   -   -
it = delete(h);       2     -   3   2   -   -   -   -   -   -   -   -

Repeat the above sequence of insertions and deletions, but this time draw the heap as a binary tree, with the highest value at the root and the values decreasing as you move down any branch.

Reminder:

Heap h;
Item it;
h = newHeap(10);

insert(h, 10);
insert(h,  5);
insert(h, 15);
insert(h , 3);
insert(h, 16);
insert(h, 13);
insert(h,  6);
it = delete(h);
insert(h,  2);
it = delete(h);
it = delete(h);
it = delete(h);
it = delete(h);
it = delete(h);

Answer:

It's much easier to consider the heap as a tree, e.g.

Tries

Consider the following trie:

Show which nodes would be visited when searching for each of the following words:
1. now
2. the
3. ant
4. access
5. actor
6. action
7. actors
8. tang
9. a
Answer:
1. now → 2, 6, 13
2. the → 3, 8, 16
3. ant → 1, 5, 12
4. access → 1, 4, 10, 17, 22, 24
5. actor → 1, 4, 11, 18, 23
6. action → 1, 4, 11, fails (no "i")
7. actors → 1, 4, 11, 18, 23, fails (leaf)
8. tang → 3, 7, 15, fails (leaf)
9. a → 1, fails (not red)
Under what circumstances would inserting a new word into a trie not result in adding any new nodes? What would be the effect on the tree of inserting the word?

Answer:

If the new word is a prefix of an existing word in the trie, then it won't require a new node to be added. The node representing the last character in the word will simply be turned into a finishing (red) node.
Show the final trie resulting from inserting the following words into an initially empty trie.
```
so   boo   jaws   boon   boot   axe   jaw   boots   sore
```
Does the order that the words are inserted affect the shape of the tree?

Answer:

Final trie:

The order of insertion does not affect the final shape of the tree.
What are the two cases when deleting a word from a trie? How is each case handled?
Answer:
1. The word ends at an internal node
  
  The finishing (red) node is simply changed into a non-finishing (black) node
2. The word ends at a leaf node
  
  The finishing (red) node is unlinked from its parent and removed (freed). If the parent of that node is a non-finishing leaf node, it is also removed from the tree. Repeat until we reach a finishing node or a non-leaf node.

Key	11	16	27	35	22	20	15	24	29	19	13
\( h(x) \)	0	5	5	2	0	9	4	2	7	8	2
\( h_2(x) \)	3	2	1	3	2	3	1	1	3	2	2