Week 09 Lab Exercise

Hash Table Implementation and Applications

Objectives

  • To explore the implementation of a simple hash table
  • To gain experience with using hash tables to implement efficient solutions to programming problems

Admin

Marks
5 (see the Assessment section for more details)
Demo
no demo required
Submit
see the Submission section
Deadline to submit to give
12pm Monday of Week 10
Late penalty
0.2% per hour or part thereof deducted from the attained mark, submissions later than 5 days not accepted

Background

In lectures, we introduced hash tables, a data structure designed to efficiently store and operate on key-value pairs, i.e., values which can be uniquely identified by some other piece of data (the key). This depends on the technique of hashing, which uses a function (the hash function) to map a large set of keys to a smaller set of hashes (e.g. integers) that are then used to actually insert, search for and delete key-value pairs in the table. Because of the assumption that there will be more keys than hashes, the largest challenge when implementing hash tables is handling hash collisions, where two keys have the same hash.

In lectures, you have learned a simple collision resolution strategy: linear probing. Starting from the ideal slot for a key (i.e., the one that is checked first), each consecutive slot is checked one after another during an insert/search. This approach has good runtime performance in spite of its simplicity, so in this lab, you will implement a hash table that uses it for collision resolution. Since hash tables are a very useful data structure in general, the remainder of the lab will be dedicated to solving some LeetCode-style problems using hash tables.

Setting Up

Create a directory for this lab, change into it, and run the following command:

unzip /web/cs2521/24T3/labs/week09/downloads/files.zip

If you're working on your own machine, download files.zip by clicking on the above link and then unzip the downloaded file.

If you've done the above correctly, you should now have the following files:

Makefile
a set of dependencies used to control compilation
HashTable.h
the interface to the HashTable ADT
HashTable.c
an implementation of the HashTable ADT (incomplete)
runHashTable.c
interactive test program for the HashTable ADT
missing.c
a stub file for Task 3
winner.c
a stub file for Task 4
similar.c
a stub file for Task 5

Once you've got these files, the first thing to do is to run the command

make

This will compile the initial version of the files, and produce the ./runHashTable, ./missing, ./winner and ./similar executables.

File Walkthrough

runHashTable.c

This program allows you to test the HashTable ADT interactively by entering commands in the terminal. Here is an example run of the program once Task 1 and 2 have been completed correctly:

./runHashTable
Interactive Hash Table Tester
Enter ? to see the list of commands.
> ?
Commands:
    + <key> <value>      insert a key-value pair
    - <key>              delete a key-value pair
    c <key>              checks if a key exists
    g <key>              gets the value associated with a key
    f <key> <default>    same as 'g' but with default value
    s                    gets the number of keys
    p                    show the hash table
    ?                    show this message
    q                    quit
> p
index         key       value
0             ---         ---
1             ---         ---
2             ---         ---
3             ---         ---
4             ---         ---
5             ---         ---
6             ---         ---
7             ---         ---
8             ---         ---
9             ---         ---
> + 15 4
Inserted (15, 4)
> + 1 12
Inserted (1, 12)
> + 26 7
Inserted (26, 7)
> + 5 28
Inserted (5, 28)
> p
index         key       value
0             ---         ---
1               1          12
2             ---         ---
3             ---         ---
4             ---         ---
5              15           4
6              26           7
7               5          28
8             ---         ---
9             ---         ---
> g 15
Value associated with key 15: 4
> g 5
Value associated with key 5: 28
> c 42
Hash table does not contain key 42
> f 42 999
Value associated with key 42: 999
> - 15
Deleted key 15
> p
index         key       value
0             ---         ---
1               1          12
2             ---         ---
3             ---         ---
4             ---         ---
5               5          28
6              26           7
7             ---         ---
8             ---         ---
9             ---         ---
> q
HashTable.[ch]

HashTable is a dynamically-sized hash table, i.e., it grows to accommodate new items whenever necessary. It stores int keys and values for simplicity. Under the hood, it will use linear probing to resolve collisions. The starter code provides a definition of the HashTable type, a struct slot type for each table slot, a few internal helper functions and implementations for most of its interface functions.

In Tasks 1 and 2, you will implement the remaining interface functions. In Tasks 3, 4 and 5, you will be required to use the HashTable ADT to solve some problems.

There are hints for some of the tasks at the bottom of this page if you get stuck. However, please make sure you spend some time to think about the problem (and check out the lecture examples) first!

Task 1 - Inserting into a Hash Table

The stub code for HashTableInsert already takes care of resizing the table if necessary. Your task is to finish the implementation, using the linear probing strategy described in the Background section (and in the lectures) to handle hash collisions:

void HashTableInsert(HashTable table, int key, int value);

If the key already exists in the hash table, this function should overwrite the existing value.

Once you think you've got the function working, test it by recompiling with make and running the runHashTable program. Note that the hash table uses the simple hash function \(h(k) = k\ \%\ N\) where \(N\) is the number of slots.

Task 2 - Deleting from a Hash Table

When deleting a key from a hash table using linear probing, we can't simply make the corresponding slot empty and be done, as this would prematurely end probe paths and hence break our search and insertion algorithms. One way of doing deletion properly is the backshift method: find the slot holding the to-be-deleted key and mark it as empty, then delete and reinsert the items in all subsequent slots up to the next empty slot.

Your task is to implement HashTableDelete, using the backshift deletion method described above (and in the lectures):

void HashTableDelete(HashTable table, int key);

If the key doesn't exist in the hash table, nothing should happen.

Once you think you've got the function working, test it by recompiling with make and running the runHashTable program. Note that the hash table uses the simple hash function \(h(k) = k\ \%\ N\) where \(N\) is the number of slots.

Task 3 - Find the Missing Integer

Given an array that contains all of the integers from 1 to \( n \) with exactly one of them missing, we want to find the missing integer. For example, given the list [1, 6, 2, 5, 3], the missing integer is 4.

There is a simple but slow \( O(n^2) \) algorithm (loop through the array and search for each integer, and return the first one which isn't there) to solve this problem. Your task is to implement the following function in missing.c, using the HashTable ADT to solve it more efficiently:

int findMissingInteger(int *integers, int n);

You should assume that the array is always well-formed, i.e., it satisfies the property described above (and hence the problem will always have a unique answer).

Testing

missing.c contains a main function which allows you to test your findMissingInteger function. It accepts a space-separated list of integers as command-line arguments, which will be converted into an array that gets passed to your function. Here are some examples of its usage, and some expected outputs:

make
...
./missing 1 6 2 5 3
The missing integer is 4
./missing 1 2 3 5 6
The missing integer is 4
Note: Your solution must use the HashTable ADT to solve the problem. Solutions that do not use the HashTable ADT will not receive any marks.

Task 4 - Who Won the Vote?

A vote has been held. Each running candidate was assigned a number, and voters were instructed to write the number of the candidate they wished to vote for on their ballot. The election is first-past-the-post, so the candidate with the most votes wins. For example, if the ballots were [1, 2, 1, 1, 3, 3], then the winner is candidate #1. If the ballots were [1, 2, 2, 1, 3] though, then we have a two-way tie with no clear winner. Since first-past-the-post doesn't specify how to handle ties, we will just declare situations like this a tie for now and resort to some other way of finding the winner later.

Your task is to implement the following function in winner.c, using the HashTable ADT to try and find the winner given all of the ballots:

int findWinner(int *ballots, int numBallots);

If there is a tie, the function should return NO_WINNER (which is defined for you in winner.c).

Testing

winner.c contains a main function which allows you to test your findWinner function. It accepts a space-separated list of integers as command-line arguments, which will be converted into an array that gets passed to your function. Here are some examples of its usage, and some expected outputs:

make
...
./winner 1 2 1 1 3 3
The winner is candidate #1
./winner 1 2 2 1 3
Tie!
Note: Your solution must use the HashTable ADT to solve the problem. Solutions that do not use the HashTable ADT will not receive any marks.

Task 5 - Determine if Two Strings are Similar

Two strings s1 and s2 similar if there exists a one-to-one mapping from each distinct character in s1 to each distinct character in s2 that transforms s1 into s2. For example:

  • The strings s1 = "adt" and s2 = "bst" are similar, because the mapping 'a' -> 'b', 'd' -> 's' and 't' -> 't' transforms s1 into s2.
  • The strings s1 = "adt" and s2 = "dcc" are not similar, because the mapping that is required to transform s1 into s2 is 'a' -> 'd', 'd' -> 'c' and 't' -> 'c', but this mapping is not one-to-one.
  • The strings s1 = "ab" and s2 = "ac" are similar, because the mapping 'a' -> 'a', 'b' -> 'c' transforms s1 into s2.
  • The strings s1 = "aa" and s2 = "bc" are not similar, because to transform s1 into s2 we would need to map 'a' to two different characters, but such a mapping would not be one-to-one.
  • The strings s1 = "abc" and s2 = "abcd" are not similar, because there is no mapping that would transform s1 into s2.

Your task is to implement the following function in similar.c, using the HashTable ADT to write an algorithm that determines if two strings are similar:

bool areSimilarStrings(char *s1, char *s2);
Testing

similar.c contains a main function which allows you to test your areSimilarStrings function. It accepts two strings as command-line arguments which will be passed to your function. Here are some examples of its usage, and some expected outputs:

make
...
./similar adt bst
The strings are similar
./similar adt dcc
The strings are not similar
./similar aa bc
The strings are not similar
Note: Your solution must use the HashTable ADT to solve the problem. Solutions that do not use the HashTable ADT will not receive any marks.

Optional Challenge Task

While hash tables are very useful and often the most efficient data structure for solving many problems, this is not always the case. In Task 3, it is actually unnecessary to use a hash table (or any other data structure!) in order to solve the problem most efficiently.

Can you design an algorithm to solve Task 3 with worst-case \( O(n) \) time complexity and \( O(1) \) space complexity?

Note: Do not replace your Task 3 solution with this, otherwise you will lose marks for not using the HashTable ADT.

Submission

Submit via the command line using the give command:

give cs2521 lab09 HashTable.c missing.c winner.c similar.c

You can also submit via give's web interface. You can submit multiple times. Only your last submission will be marked. You can check the files you have submitted here.

Assessment

All of the marks for this lab will come from automarking of code correctness. However, any solution for Task 3, 4 or 5 which does not use the HashTable ADT will receive 0 marks for that task. The automarks will be distributed as follows:

Task Mark
Task 1 - Inserting into a Hash Table 1.00
Task 2 - Deleting from a Hash Table 1.00
Task 3 - Find the Missing Integer 1.00
Task 4 - Who Won the Vote? 1.00
Task 5 - Determine if Two Strings are Similar 1.00

Appendix

Hints

You should give each task at least 15 minutes of thought and working out before looking at these hints.

Hints for Task 2

The helper function findIndex will find the key's index for you, if it exists.

Hints for Task 3

Can you make it faster to check if an integer is in the array?

Hints for Task 4

Can you make it faster to get a candidate's total number of votes?

Hints for Task 5

The HashTable type stores int keys and values, but any char can be treated as an int.

It's easier to check if two strings aren't similar than it is to check if they are.

Suppose you used a hash table to map characters in s1 to characters in s2 Would this allow you to determine that "aa" and "bc" are not similar? How about "adt" and "dcc"?

If one hash table is not enough...