COMP9315 21T1 ♢ Hash Join ♢ [0/17]
Basic idea:
- use hashing as a technique to partition relations
- to avoid having to consider all pairs of tuples
Requires sufficent memory buffers
- to hold substantial portions of partitions
- (preferably) to hold largest partition of outer relation
Other issues:
- works only for equijoin
R.i=S.j
(but this is a common case)
- susceptible to data skew (or poor hash function)
Variations:
simple,
grace,
hybrid.
COMP9315 21T1 ♢ Hash Join ♢ [1/17]
Basic approach:
- hash part of outer relation R into memory buffers (build)
- scan inner relation S, using hash to search (probe)
- if R.i=S.j, then h(R.i)=h(S.j) (hash to same buffer)
- only need to check one memory buffer for each S tuple
- repeat until whole of R has been processed
No overflows allowed in in-memory hash table
- works best with uniform hash function
- can be adversely affected by data/hash skew
COMP9315 21T1 ♢ Hash Join ♢ [2/17]
❖ Simple Hash Join (cont) | |
Data flow in hash join:
COMP9315 21T1 ♢ Hash Join ♢ [3/17]
❖ Simple Hash Join (cont) | |
Algorithm for simple hash join Join[R.i=S.j](R,S):
for each tuple r in relation R {
if (buffer[h(R.i)] is full) {
for each tuple s in relation S {
for each tuple rr in buffer[h(S.j)] {
if ((rr,s) satisfies join condition) {
add (rr,s) to result
} } }
clear all hash table buffers
}
insert r into buffer[h(R.i)]
}
Best case: # join tests ≤ rS.cR
(cf. nested-loop rS.rR)
COMP9315 21T1 ♢ Hash Join ♢ [4/17]
❖ Simple Hash Join (cont) | |
Cost for simple hash join ...
Best case: all tuples of R fit in the hash table
- Cost = bR + bS
- Same page reads as block nested loop, but less join tests
Good case: refill hash table
m times
(where m ≥ ceil(bR / (N-3)) )
- Cost = bR + m.bS
- More page reads than block nested loop, but less join tests
Worst case: everything hashes to same page
COMP9315 21T1 ♢ Hash Join ♢ [5/17]
Basic approach (for R ⨝ S ):
- partition both relations on join attribute using hashing (h1)
- load each partition of R into N-3*buffer hash table (h2)
- scan through corresponding partition of S to form results
- repeat until all partitions exhausted
For best-case cost (
O(bR + bS) ):
- need ≥ √bR buffers to hold largest partition of outer relation
If
< √bR buffers or poor hash distribution
- need to scan some partitions of S multiple times
COMP9315 21T1 ♢ Hash Join ♢ [6/17]
Partition phase (applied to both R and S):
COMP9315 21T1 ♢ Hash Join ♢ [7/17]
Probe/join phase:
The second hash function (h2
) simply speeds up the matching process.
Without it, would need to scan entire R partition for each record in S partition.
COMP9315 21T1 ♢ Hash Join ♢ [8/17]
Cost of grace hash join:
- #pages in all partition files of Rel ≅ bRel
(maybe slightly more)
- partition relation R ...
Cost =
read(bR) + write(≅bR)
= 2bR
- partition relation S ...
Cost =
read(bS) + write(≅bS)
= 2bS
- probe/join requires one scan of each (partitioned) relation
Cost = bR + bS
- all hashing and comparison occurs in memory ⇒ tiny cost
Total Cost =
2bR + 2bS + bR + bS =
3 (bR + bS)
COMP9315 21T1 ♢ Hash Join ♢ [9/17]
A variant of grace hash join if we have √bR < N < bR+2
- create k≪N partitions, 1 in memory, k-1 on disk
- buffers: 1 input, k-1 output, p = N-k-2 for in-memory partition
When we come to scan and partition
S relation
- any tuple with hash 0 can be resolved (using in-memory partition)
- other tuples are written to one of k partition files for S
Final phase is same as grace join, but with only
k-1 partitions.
Comparison:
- grace hash join creates N-1 partitions on disk
- hybrid hash join creates 1 (memory) + k-1 (disk) partitions
COMP9315 21T1 ♢ Hash Join ♢ [10/17]
❖ Hybrid Hash Join (cont) | |
First phase of hybrid hash join (partitioning R):
COMP9315 21T1 ♢ Hash Join ♢ [11/17]
❖ Hybrid Hash Join (cont) | |
Next phase of hybrid hash join (partitioning S):
COMP9315 21T1 ♢ Hash Join ♢ [12/17]
❖ Hybrid Hash Join (cont) | |
Final phase of hybrid hash join (finishing join):
COMP9315 21T1 ♢ Hash Join ♢ [13/17]
❖ Hybrid Hash Join (cont) | |
Some observations:
- with k partitions, each partition has expected size ceil(bR/k)
- holding 1 partition in memory needs ceil(bR/k) buffers
- trade-off between in-memory partition space and #partitions
Other notes:
- if N = bR+2, using block nested loop join is simpler
- cost depends on N (but less than grace hash join)
For
k partitions, Cost = (3-2/k).(b
R+b
S)
COMP9315 21T1 ♢ Hash Join ♢ [14/17]
SQL query on student/enrolment database:
select E.subj, S.name
from Student S join Enrolled E on (S.id = E.stude)
order by E.subj
And its relational algebra equivalent:
Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )
Database: rS = 20000, cS = 20, bS = 1000, rE = 80000, cE = 40, bE = 2000
We are interested only in the cost of Join, with N buffers
COMP9315 21T1 ♢ Hash Join ♢ [15/17]
❖ Costs for Join Example (cont) | |
Costs for hash join variants on example (N=103):
Hash Join Method |
Cost Analysis |
Cost |
Hybrid Hash Join |
(3-2/k).(bS+bE) = 2.8((1000+2000)
assuming k = 10 ... and one partition fits in 91 pages
| 8700 |
Grace Hash Join |
3(bS+bE) = 3(1000+2000) |
9000 |
Simple Hash Join |
bS + bE.ceil(bR/(N-3)) =
1000 + ceil(1000/100).2000 = 1000 + 10.2000 |
21000 |
Sort-merge Join |
sort(S) + sort(E) + bS + bE =
2.1000.2 + 2.2000.2 + 1000 + 2000 |
11000 |
Nested-loop Join |
bS + bE.ceil(bS/(N-2)) =
1000 + 2000.ceil(1000/101) = 1000 + 10.2000 |
21000 |
COMP9315 21T1 ♢ Hash Join ♢ [16/17]
Join implementations are under: src/backend/executor
PostgreSQL suports three kinds of join:
- nested loop join (
nodeNestloop.c
)
- sort-merge join (
nodeMergejoin.c
)
- hash join (
nodeHashjoin.c
) (hybrid hash join)
Query optimiser chooses appropriate join, by considering
- physical characteristics of tables being joined
- estimated selectivity (likely number of result tuples)
COMP9315 21T1 ♢ Hash Join ♢ [17/17]
Produced: 25 Apr 2021