❖ Hashing in PostgreSQL |
PostgreSQL uses linear hashing on tables which have been:
create index Ix on R using hash (k);
Hash file implementation: backend/access/hash
hashfunc.c
hashinsert.c
hashpage.c
hashsearch.c
src/backend/access/hash/README
Based on "A New Hashing Package for Unix", Margo Seltzer, Winter Usenix 1991
❖ PostgreSQL Hash Function |
PostgreSQL generic hash function (simplified):
Datum hash_any(unsigned char *k, int keylen) { uint32 a, b, c, len, *ka = (uint32 *)k; /* Set up the internal state */ len = keylen; a = b = c = 0x9e3779b9+len+3923095; /* handle most of the key */ while (len >= 12) { a += ka[0]; b += ka[1]; c += ka[2]; mix(a, b, c); ka += 3; len -= 12; } ... collect data from remaining bytes into a,b,c ... mix(a, b, c); return UInt32GetDatum(c); }
See backend/access/hash/hashfunc.c
mix()
❖ PostgreSQL Hash Function (cont) |
hash_any()
uint32
Typically invoked from a type-specific function, e.g.
Datum hashint4(PG_FUNCTION_ARGS) { return hash_uint32(PG_GETARG_INT32(0)); }
where hash_uint32()
hash_any()
Hash value is "wrapped" as a Datum
❖ PostgreSQL Hash Function (cont) |
Implementation of hash → page ID
Bucket _hash_hashkey2bucket(uint32 hashkey, uint32 maxbucket, uint32 highmask, uint32 lowmask) { Bucket bucket; bucket = hashkey & highmask; if (bucket > maxbucket) bucket = bucket & lowmask; return bucket; }
❖ Hash Files in PostgreSQL |
PostgreSQL uses different file organisation ...
Confusingly, PostgreSQL calls "group pointers" as "split pointers"
❖ Hash Files in PostgreSQL (cont) |
Approximate method for converting bucket # to page address:
// which page is primary page of bucket uint bucket_to_page(headerp, B) { uint *splits = headerp->hashm_spares; uint chunk, base, offset, lg2(uint); chunk = (B<2) ? 0 : lg2(B+1)-1; base = splits[chunk]; offset = (B<2) ? B : B-(1<<chunk); return (base + offset); } // returns ceil(log_2(n)) int lg2(uint n) { int i, v; for (i = 0, v = 1; v < n; v <<= 1) i++; return i; }
Produced: 11 Mar 2021