Week 09 Tutorial Questions

Hash Tables

  1. Consider this potential hash function:

    int hash(char *key, int N) {
    	int h = 0;
    	char *c;
    	for (c = key; *c != '\0'; c++) {
    		h = h + *c;
    	}
    	return h % N;
    }
    

    How does this function convert strings to ints?

    What are the deficiencies with this function and how can it be improved?

  2. For a hash table of size \( N \) that uses separate chaining for collision resolution, with the chains sorted in ascending order by key, what are the best case and worst case costs for inserting \( k = 2N \) items into the table? Assume that insertion cost is measured in terms of number of key comparisons. What will be average search cost after all the insertions are done?

  3. Consider a very small hash table with only 11 entries, and a simple hash function \( h(x) = x\ \%\ 11 \). If we start with an empty table and insert the following keys in the order shown

    11  16  27  35  22  20  15  24  29  19  13
    

    give the final state of the table for each of the following collision resolution strategies

    1. separate chaining, with chains in ascending order
    2. linear probing
    3. double hashing, with \( h_2(x) = (x\ \%\ 3) + 1 \)
  4. Consider the following hash table with primary hash function \(h(x) = x\ \%\ 10\), that uses double hashing for collision resolution with secondary hash function \(h_2(x) = x\ \%\ 3 + 1\):

    Index 0 1 2 3 4 5 6 7 8 9
    Key 0 12 24 36 48

    What happens if we try to insert the key 10? Explain why this happens.

  5. Both hash tables and balanced search trees (e.g., AVL trees) are efficient data structures that can be used for the insertion, searching and deletion of key-value pairs. Compare the advantages and disadvantages of each.

  6. In lectures, we discussed how to solve the famous two-sum problem efficiently using a hash table. Taking inspiration from that, let's see if you can now solve the three-sum problem!

    Here's the problem - given an array of integers and a target sum \(S\), determine whether the array contains three integers that sum to \(S\). Try to find a solution that is \(O(n^2)\) on average.

    bool threeSum(int arr[], int size, int sum);
    

    As a reminder, here are the hash table operations:

    HashTable HashTableNew(void);
    void HashTableFree(HashTable ht);
    void HashTableInsert(HashTable ht, int key, int value);
    bool HashTableContains(HashTable ht, int key);
    int  HashTableGet(HashTable ht, int key);
    void HashTableDelete(HashTable ht, int key);
    int  HashTableSize(HashTable ht);
    
  7. Fun fact: 2521 is a prime number! If a hash table which stores integer keys had 2521 slots and used double hashing for collision resolution, define a suitable secondary hash function \(h_2\) for the table. Briefly comment on what makes it suitable.

  8. Calls to functions (e.g. some recursive functions) are sometimes very expensive to compute. One often-employed trick when working with such functions is memoisation: the first time the function is called with some input, we cache this result (i.e. store it somewhere for use later on), and then for subsequent calls to the function with the same input, we consult the cache rather than actually evaluating the call.

    1. Design an ADT for a memoiser that supports memoisation of an expensive single-argument function.
    2. One potential problem with memoisation is that the cache's memory usage can grow quite large when the function is called for lots of inputs. This is often solved by limiting the size of the cache and having a policy which can be used to pick entries to remove from the cache to make more room. What are some possible policies you could use?
    3. Can you think of how you would actually implement some of these policies?