Week 10 Lecture (10)

10

Grace Hash Join

Basic approach (for R ⋈ S ):

partition both relations on join attribute using hashing (h1)
load each partition of R into N-buffer hash table (h2)
scan through corresponding partition of S to form results
repeat until all partitions exhausted

For best-case cost (O(b_R + b_S) ):

need ≥ √b_R buffers to hold largest partition of outer relation

If < √b_R buffers or poor hash distribution

need to scan some partitions of S multiple times