❖ Associative Indexing |
Regular array indexing is positional ([0][1][2]
array[K]❖ Hashing |
Hashing allows us to get close to associative indexing
Ideally, we would like ...
courses["COMP3311"] = "Database Systems";
printf("%s\n", courses["COMP3311"]);
but C doesn't have non-integer index values.
Something almost as good:
courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);
Hash function h
❖ ... Hashing |
What the h
Converts a key value (of any type) into an integer index.
Sounds good ... in practice, not so simple ...
❖ ... Hashing |
Reminder: what we'd like ...
courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);
In practice, we do something like ...
key = "COMP3311";
item = {"COMP3311","Database Systems",...};
courses = HashInsert(courses, key, item);
printf("%s\n", HashGet(courses, "COMP3311"));
❖ ... Hashing |
To use arbitrary values as keys, we need ...
KeyKeyItemItemKeyKey
N
So, we also need a collision resolution method
❖ Hash Table ADT |
Generalised ADT for a collection of Item
Interface:
typedef struct CollectionRep *Collection; Collection newCollection(); // make new empty collection Item *search(Collection, Key); // find item with key void insert(Collection, Item); // add item into collection void delete(Collection, Key); // drop item with key
Implementation:
typedef struct CollectionRep {
... some data structure to hold multiple Items ...
} CollectionRep;
❖ ... Hash Table ADT |
For hash tables, make one change to "standard" Collection interface:
typedef struct HashTabRep *HashTable; // make new empty table of size N HashTable newHashTable(int); // add item into collection void HashInsert(HashTable, Item); // find item with key Item *HashGet(HashTable, Key); // drop item with key void HashDelete(HashTable, Key); // free memory of a HashTable void dropHashTable(HashTable);
i.e. we specify max # elements that can be stored in the collection
❖ ... Hash Table ADT |
Example hash table implementation:
typedef struct HashTabRep {
Item **items; // array of (Item *)
int N; // size of array
int nitems; // # Items in array
} HashTabRep;
HashTable newHashTable(int N)
{
HashTable new = malloc(sizeof(HashTabRep));
new->items = malloc(N*sizeof(Item *));
new->N = N;
new->nitems = 0;
for (int i = 0; i < N; i++) new->items[i] = NULL;
return new;
}
❖ ... Hash Table ADT |
Hash table implementation (cont)
void HashInsert(HashTable ht, Item it) {
int h = hash(key(it), ht->N);
// assume table slot empty!?
ht->items[h] = copy(it);
ht->nitems++;
}
Item *HashGet(HashTable ht, Key k) {
int h = hash(k, ht->N);
Item *itp = ht->items[h];
if (itp != NULL && equal(key(*itp),k))
return itp;
else
return NULL;
}
key()copy()Itemequal()Key
❖ ... Hash Table ADT |
Hash table implementation (cont)
void HashDelete(HashTable ht, Key k) {
int h = hash(k, ht->N);
Item *itp = ht->items[h];
if (itp != NULL && equal(key(*itp),k)) {
free(itp);
ht->items[h] = NULL;
ht->nitems--;
}
}
void dropHashTable(HashTable ht) {
for (int i = 0; i < ht->N; i++) {
if (ht->items[i] != NULL) free(ht->items[i]);
}
free(ht);
}
❖ Hash Functions |
Characteristics of hash functions:
Keymod❖ ... Hash Functions |
Basic mechanism of hash functions:
int hash(Key key, int N)
{
int val = convert key to 32-bit int;
return val % N;
}
If keys are int
How to convert keys which are strings?
(e.g. "COMP1927""John"
Definitely prefer that hash("cat""dog"
Prefer that hash("cat""act""tac"
❖ ... Hash Functions |
Universal hashing uses entire key, with position info:
int hash(char *key, int N)
{
int h = 0, a = 31415, b = 21783;
char *c;
for (c = key; *c != '\0'; c++) {
a = a*b % (N-1);
h = (a * h + *c) % N;
}
return h;
}
Has some similarities with RNG. Aim: "spread" hash values over [0..N-1]
❖ ... Hash Functions |
A real hash function (from PostgreSQL DBMS):
hash_any(unsigned char *k, register int keylen, int N)
{
register uint32 a, b, c, len;
// set up internal state
len = keylen;
a = b = 0x9e3779b9;
c = 3923095;
// handle most of the key, in 12-char chunks
while (len >= 12) {
a += (k[0] + (k[1] << 8) + (k[2] << 16) + (k[3] << 24));
b += (k[4] + (k[5] << 8) + (k[6] << 16) + (k[7] << 24));
c += (k[8] + (k[9] << 8) + (k[10] << 16) + (k[11] << 24));
mix(a, b, c);
k += 12; len -= 12;
}
// collect any data from remaining bytes into a,b,c
mix(a, b, c);
return c % N;
}
❖ ... Hash Functions |
Where mix
#define mix(a,b,c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
i.e. scrambles all of the bits from the bytes of the key value
❖ Problems with Hashing |
In ideal scenarios, search cost in hash table is O(1).
Problems with hashing:
Item
size(IndexSpace), collisions inevitable
nitemsnslotsProduced: 9 Jul 2020