Week 09 Lab Exercise
Hash Table Implementation and Applications
Objectives
- To explore the implementation of a simple hash table
- To gain experience with using hash tables to implement efficient solutions to programming problems
Admin
- Marks
- 5 (see the Assessment section for more details)
- Demo
- no demo required
- Submit
- see the Submission section
- Deadline to submit to give
- 12pm Monday of Week 10
- Late penalty
- 0.2% per hour or part thereof deducted from the attained mark, submissions later than 5 days not accepted
Background
In lectures, we introduced hash tables, a data structure designed to efficiently store and operate on key-value pairs, i.e., values which can be uniquely identified by some other piece of data (the key). This depends on the technique of hashing, which uses a function (the hash function) to map a large set of keys to a smaller set of hashes (e.g. integers) that are then used to actually insert, search for and delete key-value pairs in the table. Because of the assumption that there will be more keys than hashes, the largest challenge when implementing hash tables is handling hash collisions, where two keys have the same hash.
In lectures, you have learned a simple collision resolution strategy: linear probing. Starting from the ideal slot for a key (i.e., the one that is checked first), each consecutive slot is checked one after another during an insert/search. This approach has good runtime performance in spite of its simplicity, so in this lab, you will implement a hash table that uses it for collision resolution. Since hash tables are a very useful data structure in general, the remainder of the lab will be dedicated to solving some LeetCode-style problems using hash tables.
Setting Up
Create a directory for this lab, change into it, and run the following command:
unzip /web/cs2521/24T3/labs/week09/downloads/files.zip
If you're working on your own machine, download files.zip
by clicking on the above link and then unzip the downloaded file.
If you've done the above correctly, you should now have the following files:
Makefile
- a set of dependencies used to control compilation
HashTable.h
- the interface to the
HashTable
ADT HashTable.c
- an implementation of the
HashTable
ADT (incomplete) runHashTable.c
- interactive test program for the
HashTable
ADT missing.c
- a stub file for Task 3
winner.c
- a stub file for Task 4
similar.c
- a stub file for Task 5
Once you've got these files, the first thing to do is to run the command
make
This will compile the initial version of the files, and produce the ./runHashTable
, ./missing
, ./winner
and ./similar
executables.
File Walkthrough
runHashTable.c
-
This program allows you to test the
HashTable
ADT interactively by entering commands in the terminal. Here is an example run of the program once Task 1 and 2 have been completed correctly:./runHashTable Interactive Hash Table Tester Enter ? to see the list of commands. > ? Commands: + <key> <value> insert a key-value pair - <key> delete a key-value pair c <key> checks if a key exists g <key> gets the value associated with a key f <key> <default> same as 'g' but with default value s gets the number of keys p show the hash table ? show this message q quit > p index key value 0 --- --- 1 --- --- 2 --- --- 3 --- --- 4 --- --- 5 --- --- 6 --- --- 7 --- --- 8 --- --- 9 --- --- > + 15 4 Inserted (15, 4) > + 1 12 Inserted (1, 12) > + 26 7 Inserted (26, 7) > + 5 28 Inserted (5, 28) > p index key value 0 --- --- 1 1 12 2 --- --- 3 --- --- 4 --- --- 5 15 4 6 26 7 7 5 28 8 --- --- 9 --- --- > g 15 Value associated with key 15: 4 > g 5 Value associated with key 5: 28 > c 42 Hash table does not contain key 42 > f 42 999 Value associated with key 42: 999 > - 15 Deleted key 15 > p index key value 0 --- --- 1 1 12 2 --- --- 3 --- --- 4 --- --- 5 5 28 6 26 7 7 --- --- 8 --- --- 9 --- --- > q
HashTable.[ch]
-
HashTable
is a dynamically-sized hash table, i.e., it grows to accommodate new items whenever necessary. It storesint
keys and values for simplicity. Under the hood, it will use linear probing to resolve collisions. The starter code provides a definition of theHashTable
type, astruct slot
type for each table slot, a few internal helper functions and implementations for most of its interface functions.In Tasks 1 and 2, you will implement the remaining interface functions. In Tasks 3, 4 and 5, you will be required to use the
HashTable
ADT to solve some problems.
Task 1 - Inserting into a Hash Table
The stub code for HashTableInsert
already takes care of resizing the table if necessary. Your task is to finish the implementation, using the linear probing strategy described in the Background section (and in the lectures) to handle hash collisions:
void HashTableInsert(HashTable table, int key, int value);
If the key already exists in the hash table, this function should overwrite the existing value.
Once you think you've got the function working, test it by recompiling with make and running the runHashTable
program. Note that the hash table uses the simple hash function \(h(k) = k\ \%\ N\) where \(N\) is the number of slots.
Task 2 - Deleting from a Hash Table
When deleting a key from a hash table using linear probing, we can't simply make the corresponding slot empty and be done, as this would prematurely end probe paths and hence break our search and insertion algorithms. One way of doing deletion properly is the backshift method: find the slot holding the to-be-deleted key and mark it as empty, then delete and reinsert the items in all subsequent slots up to the next empty slot.
Your task is to implement HashTableDelete
, using the backshift deletion method described above (and in the lectures):
void HashTableDelete(HashTable table, int key);
If the key doesn't exist in the hash table, nothing should happen.
Once you think you've got the function working, test it by recompiling with make and running the runHashTable
program. Note that the hash table uses the simple hash function \(h(k) = k\ \%\ N\) where \(N\) is the number of slots.
Task 3 - Find the Missing Integer
Given an array that contains all of the integers from 1 to \( n \) with exactly one of them missing, we want to find the missing integer. For example, given the list [1, 6, 2, 5, 3]
, the missing integer is 4.
There is a simple but slow \( O(n^2) \) algorithm (loop through the array and search for each integer, and return the first one which isn't there) to solve this problem. Your task is to implement the following function in missing.c
, using the HashTable
ADT to solve it more efficiently:
int findMissingInteger(int *integers, int n);
You should assume that the array is always well-formed, i.e., it satisfies the property described above (and hence the problem will always have a unique answer).
Testing
missing.c
contains a main function which allows you to test your findMissingInteger
function. It accepts a space-separated list of integers as command-line arguments, which will be converted into an array that gets passed to your function. Here are some examples of its usage, and some expected outputs:
make ... ./missing 1 6 2 5 3 The missing integer is 4 ./missing 1 2 3 5 6 The missing integer is 4
HashTable
ADT to solve the problem. Solutions that do not use the HashTable
ADT will not receive any marks.
Task 4 - Who Won the Vote?
A vote has been held. Each running candidate was assigned a number, and voters were instructed to write the number of the candidate they wished to vote for on their ballot. The election is first-past-the-post, so the candidate with the most votes wins. For example, if the ballots were [1, 2, 1, 1, 3, 3]
, then the winner is candidate #1. If the ballots were [1, 2, 2, 1, 3]
though, then we have a two-way tie with no clear winner. Since first-past-the-post doesn't specify how to handle ties, we will just declare situations like this a tie for now and resort to some other way of finding the winner later.
Your task is to implement the following function in winner.c
, using the HashTable
ADT to try and find the winner given all of the ballots:
int findWinner(int *ballots, int numBallots);
If there is a tie, the function should return NO_WINNER
(which is defined for you in winner.c
).
Testing
winner.c
contains a main
function which allows you to test your findWinner
function. It accepts a space-separated list of integers as command-line arguments, which will be converted into an array that gets passed to your function. Here are some examples of its usage, and some expected outputs:
make ... ./winner 1 2 1 1 3 3 The winner is candidate #1 ./winner 1 2 2 1 3 Tie!
HashTable
ADT to solve the problem. Solutions that do not use the HashTable
ADT will not receive any marks.
Task 5 - Determine if Two Strings are Similar
Two strings s1
and s2
similar if there exists a one-to-one mapping from each distinct character in s1
to each distinct character in s2
that transforms s1
into s2
. For example:
- The strings
s1 = "adt"
ands2 = "bst"
are similar, because the mapping'a' -> 'b'
,'d' -> 's'
and't' -> 't'
transformss1
intos2
. - The strings
s1 = "adt"
ands2 = "dcc"
are not similar, because the mapping that is required to transforms1
intos2
is'a' -> 'd'
,'d' -> 'c'
and't' -> 'c'
, but this mapping is not one-to-one. - The strings
s1 = "ab"
ands2 = "ac"
are similar, because the mapping'a' -> 'a'
,'b' -> 'c'
transformss1
intos2
. - The strings
s1 = "aa"
ands2 = "bc"
are not similar, because to transforms1
intos2
we would need to map'a'
to two different characters, but such a mapping would not be one-to-one. - The strings
s1 = "abc"
ands2 = "abcd"
are not similar, because there is no mapping that would transforms1
intos2
.
Your task is to implement the following function in similar.c
, using the HashTable
ADT to write an algorithm that determines if two strings are similar:
bool areSimilarStrings(char *s1, char *s2);
Testing
similar.c
contains a main function which allows you to test your areSimilarStrings
function. It accepts two strings as command-line arguments which will be passed to your function. Here are some examples of its usage, and some expected outputs:
make ... ./similar adt bst The strings are similar ./similar adt dcc The strings are not similar ./similar aa bc The strings are not similar
HashTable
ADT to solve the problem. Solutions that do not use the HashTable
ADT will not receive any marks.
Optional Challenge Task
While hash tables are very useful and often the most efficient data structure for solving many problems, this is not always the case. In Task 3, it is actually unnecessary to use a hash table (or any other data structure!) in order to solve the problem most efficiently.
Can you design an algorithm to solve Task 3 with worst-case \( O(n) \) time complexity and \( O(1) \) space complexity?
HashTable
ADT.
Submission
Submit via the command line using the give
command:
give cs2521 lab09 HashTable.c missing.c winner.c similar.c
You can also submit via give's web interface. You can submit multiple times. Only your last submission will be marked. You can check the files you have submitted here.
Assessment
All of the marks for this lab will come from automarking of code correctness. However, any solution for Task 3, 4 or 5 which does not use the HashTable
ADT will receive 0 marks for that task. The automarks will be distributed as follows:
Task | Mark |
---|---|
Task 1 - Inserting into a Hash Table | 1.00 |
Task 2 - Deleting from a Hash Table | 1.00 |
Task 3 - Find the Missing Integer | 1.00 |
Task 4 - Who Won the Vote? | 1.00 |
Task 5 - Determine if Two Strings are Similar | 1.00 |
Appendix
Hints
You should give each task at least 15 minutes of thought and working out before looking at these hints.
Hints for Task 2
The helper function findIndex
will find the key's index for you, if it exists.
Hints for Task 3
Can you make it faster to check if an integer is in the array?
Hints for Task 4
Can you make it faster to get a candidate's total number of votes?
Hints for Task 5
The HashTable
type stores int
keys and values, but any char
can be treated as an int
.
It's easier to check if two strings aren't similar than it is to check if they are.
Suppose you used a hash table to map characters in s1
to characters in s2
Would this allow you to determine that "aa"
and "bc"
are not similar? How about "adt"
and "dcc"
?
If one hash table is not enough...