❖ Hashing in PostgreSQL |
PostgreSQL uses linear hashing on tables which have been:
create index Ix on R using hash (k);
Hash file implementation: backend/access/hash
hashfunc.chashinsert.chashpage.chashsearch.csrc/backend/access/hash/READMEBased on "A New Hashing Package for Unix", Margo Seltzer, Winter Usenix 1991
❖ PostgreSQL Hash Function |
PostgreSQL generic hash function (simplified):
Datum hash_any(unsigned char *k, int keylen)
{
uint32 a, b, c, len, *ka = (uint32 *)k;
/* Set up the internal state */
len = keylen;
a = b = c = 0x9e3779b9+len+3923095;
/* handle most of the key */
while (len >= 12) {
a += ka[0]; b += ka[1]; c += ka[2];
mix(a, b, c);
ka += 3; len -= 12;
}
... collect data from remaining bytes into a,b,c ...
mix(a, b, c);
return UInt32GetDatum(c);
}
See backend/access/hash/hashfunc.cmix()
❖ PostgreSQL Hash Function (cont) |
hash_any()uint32
Typically invoked from a type-specific function, e.g.
Datum
hashint4(PG_FUNCTION_ARGS)
{
return hash_uint32(PG_GETARG_INT32(0));
}
where hash_uint32()hash_any()
Hash value is "wrapped" as a Datum
❖ PostgreSQL Hash Function (cont) |
Implementation of hash → page ID
Bucket
_hash_hashkey2bucket(uint32 hashkey, uint32 maxbucket,
uint32 highmask, uint32 lowmask)
{
Bucket bucket;
bucket = hashkey & highmask;
if (bucket > maxbucket)
bucket = bucket & lowmask;
return bucket;
}
❖ Hash Files in PostgreSQL |
PostgreSQL uses different file organisation ...
Confusingly, PostgreSQL calls "group pointers" as "split pointers"
❖ Hash Files in PostgreSQL (cont) |
Approximate method for converting bucket # to page address:
// which page is primary page of bucket uint bucket_to_page(headerp, B) { uint *splits = headerp->hashm_spares; uint chunk, base, offset, lg2(uint); chunk = (B<2) ? 0 : lg2(B+1)-1; base = splits[chunk]; offset = (B<2) ? B : B-(1<<chunk); return (base + offset); } // returns ceil(log_2(n)) int lg2(uint n) { int i, v; for (i = 0, v = 1; v < n; v <<= 1) i++; return i; }
Produced: 11 Mar 2021