F: Implementing Join

COMP9315 24T1 ♢ Lectures Part F ♢ [0/35]
❖ Join

DBMSs are engines to store, combine and filter information.

Join () is the primary means of combining information.

Join is important and potentially expensive

Most common join condition: equijoin, e.g. (R.pk = S.fk)

Join varieties (natural, inner, outer, semi, anti) all behave similarly.

We consider three strategies for implementing join

COMP9315 24T1 ♢ Lectures Part F ♢ [1/35]
❖ Join Example

Consider a university database with the schema:

create table Student(
   id     integer primary key,
   name   text,  ...
);
create table Enrolled(
   stude  integer references Student(id),
   subj   text references Subject(code),  ...
);
create table Subject(
   code   text primary key,
   title  text,  ...
);

COMP9315 24T1 ♢ Lectures Part F ♢ [2/35]
❖ Join Example (cont)

List names of students in all subjects, arranged by subject.

SQL query to provide this information:

select E.subj, S.name
from   Student S, Enrolled E
where  S.id = E.stude
order  by E.subj, S.name;

And its relational algebra equivalent:

Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )


To simplify formulae, we denote Student by S and Enrolled by E
COMP9315 24T1 ♢ Lectures Part F ♢ [3/35]
❖ Join Example (cont)

Some database statistics:

Sym Meaning Value
rS # student records 20,000
rE # enrollment records 80,000
cS Student records/page 20
cE Enrolled records/page 40
bS # data pages in Student 1,000
bE # data pages in Enrolled 2,000

Also, in cost analyses below, N = number of memory buffers.

COMP9315 24T1 ♢ Lectures Part F ♢ [4/35]
❖ Join Example (cont)

Out = Student ⋈ Enrolled relation statistics:

Sym Meaning Value
rOut # tuples in result 80,000
COut result records/page 80
bOut # data pages in result 1,000

Notes:

COMP9315 24T1 ♢ Lectures Part F ♢ [5/35]
❖ Nested Loop Join

Basic strategy (R.a ⋈ S.b):

Result = {}
for each page i in R {
   pageR = getPage(R,i)
   for each page j in S {
      pageS = getPage(S,j)
      for each pair of tuples tR,tS
                       from pageR,pageS {
         if (tR.a == tS.b)
            Result = Result ∪ (tR:tS)
}  }  }

Needs input buffers for R and S, output buffer for "joined" tuples

Terminology: R is outer relation, S is inner relation

Cost = bR . bS   ...   ouch!

COMP9315 24T1 ♢ Lectures Part F ♢ [6/35]
❖ Block Nested Loop Join

Method (for N memory buffers):

[Diagram:Pics/join/blk-nested-loop.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [7/35]
❖ Block Nested Loop Join (cont)

Best-case scenario: bR ≤ N-2

Cost   =   bR + bS

Typical-case scenario: bR > N-2

Cost   =   bR + bS . ceil(bR/N-2)

Note: always requires rR.rS checks of the join condition

COMP9315 24T1 ♢ Lectures Part F ♢ [8/35]
❖ BNJ in Practice

Why block nested loop join is actually useful in practice ...

Many queries have the form

select * from R,S where r.i=s.j and r.x=k

This would typically be evaluated as

Tmp = Sel[r.x=k](R)
Res = Tmp Join[i=j] S

If Tmp is small ⇒ may fit in memory (in small #buffers)

COMP9315 24T1 ♢ Lectures Part F ♢ [9/35]
❖ Index Nested Loop Join

A problem with nested-loop join:

If there is an index on S, we can avoid such repeated whole-of-S  scanning.

Consider Join[i=j](R,S):

for each tuple r in relation R {
    use index to select tuples
        from S where s.j = r.i
    for each selected tuple s from S {
        add (r,s) to result
}   }

COMP9315 24T1 ♢ Lectures Part F ♢ [10/35]
❖ Index Nested Loop Join (cont)

This method requires:

Cost   =   bR + rR.SelS    (SelS is the cost of performing a select on S).

Typical SelS  =  1-2 (hashing) .. bq (unclustered index)

Trade-off:   rR.SelS  vs  bR.bS,   where   bR ≪ rR and SelS ≪ bS

COMP9315 24T1 ♢ Lectures Part F ♢ [11/35]
❖ Sort-Merge Join

Basic approach:

Advantages: Disadvantages:
COMP9315 24T1 ♢ Lectures Part F ♢ [12/35]
❖ Sort-Merge Join (cont)

Method requires several cursors to scan sorted relations:

[Diagram:Pics/join/sort-merge.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [13/35]
❖ Sort-Merge Join (cont)

Algorithm using query iterators/scanners:

Query ri, si;  Tuple r,s;

ri = startScan("SortedR");
si = startScan("SortedS");
while ((r = nextTuple(ri)) != NULL
       && (s = nextTuple(si)) != NULL) {
    // align cursors to start of next common run
    while (r != NULL && r.i < s.j)
           r = nextTuple(ri);
    if (r == NULL) break;
    while (s != NULL && r.i > s.j)
           s = nextTuple(si);
    if (s == NULL) break;
    // must have (r.i == s.j) here
...

COMP9315 24T1 ♢ Lectures Part F ♢ [14/35]
❖ Sort-Merge Join (cont)

...
    // remember start of current run in S
    TupleID startRun = scanCurrent(si)
    // scan common run, generating result tuples
    while (r != NULL && r.i == s.j) {
        while (s != NULL and s.j == r.i) {
            addTuple(outbuf, combine(r,s));
            if (isFull(outbuf)) {
                writePage(outf, outp++, outbuf);
                clearBuf(outbuf);
            }
            s = nextTuple(si);
        }
        r = nextTuple(ri);
        setScan(si, startRun);
    }
}

COMP9315 24T1 ♢ Lectures Part F ♢ [15/35]
❖ Sort-Merge Join (cont)

Buffer requirements:

COMP9315 24T1 ♢ Lectures Part F ♢ [16/35]
❖ Sort-Merge Join (cont)

Cost of sort-merge join.

Step 1: sort each relation   (if not already sorted):

Step 2: merge sorted relations:
COMP9315 24T1 ♢ Lectures Part F ♢ [17/35]
❖ Sort-Merge Join on Example

Case 1:   Join[id=stude](Student,Enrolled)

 
Cost = sort(S) + sort(E) + bS + bE
= 2bS(1+log31(bS/32)) + 2bE(1+log31(bE/32)) + bS + bE
= 2×1000×(1+2) + 2×2000×(1+2) + 1000 + 2000
= 6000 + 12000 + 1000 + 2000
= 21,000
COMP9315 24T1 ♢ Lectures Part F ♢ [18/35]
❖ Sort-Merge Join on Example (cont)

Case 2:   Join[id=stude](Student,Enrolled)

For the above, no re-scans of E runs are ever needed

Cost  =  2,000 + 1,000  =  3,000   (regardless of which relation is outer)

COMP9315 24T1 ♢ Lectures Part F ♢ [19/35]
❖ Hash Join

Basic idea:

Requires sufficent memory buffers Other issues: Variations:   simple,   grace,   hybrid.
COMP9315 24T1 ♢ Lectures Part F ♢ [20/35]
❖ Simple Hash Join

Basic approach:

No overflows allowed in in-memory hash table
COMP9315 24T1 ♢ Lectures Part F ♢ [21/35]
❖ Simple Hash Join (cont)

Data flow:


[Diagram:Pics/join/simple-hash.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [22/35]
❖ Simple Hash Join (cont)

Algorithm for simple hash join Join[R.i=S.j](R,S):

for each tuple r in relation R {
   if (buffer[h(R.i)] is full) {
      for each tuple s in relation S {
         for each tuple rr in buffer[h(S.j)] {
            if ((rr,s) satisfies join condition) {
               add (rr,s) to result
      }  }  }
      clear all hash table buffers
   }
   insert r into buffer[h(R.i)]
}

Best case:  # join tests  ≤  rS.cR    (cf. nested-loop  rS.rR)

COMP9315 24T1 ♢ Lectures Part F ♢ [23/35]
❖ Simple Hash Join (cont)

Cost for simple hash join ...

Best case: all tuples of R fit in the hash table

Good case: refill hash table m times (where m ≥ ceil(bR / (N-2)) ) Worst case: everything hashes to same page
COMP9315 24T1 ♢ Lectures Part F ♢ [24/35]
❖ Grace Hash Join

Basic approach (for R ⋈ S ):

For best-case cost (O(bR + bS) ): If < √bR buffers or poor hash distribution
COMP9315 24T1 ♢ Lectures Part F ♢ [25/35]
❖ Grace Hash Join (cont)

Partition phase (applied to both R and S):

[Diagram:Pics/join/grace-hash1.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [26/35]
❖ Grace Hash Join (cont)

Probe/join phase:

[Diagram:Pics/join/hash2.png]

The second hash function (h2) simply speeds up the matching process.
Without it, would need to scan entire R partition for each record in S partition.

COMP9315 24T1 ♢ Lectures Part F ♢ [27/35]
❖ Grace Hash Join (cont)

Cost of grace hash join:

Total Cost   =   2bR + 2bS + bR + bS   =   3 (bR + bS)
COMP9315 24T1 ♢ Lectures Part F ♢ [28/35]
❖ Hybrid Hash Join

A variant of grace join if we have √bR < N < bR+2

When we come to scan and partition S relation Final phase is same as grace join, but with only k partitions.

Comparison:

COMP9315 24T1 ♢ Lectures Part F ♢ [29/35]
❖ Hybrid Hash Join (cont)

First phase of hybrid hash join with m=1 (partitioning R):

[Diagram:Pics/join/hyb-hash1.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [30/35]
❖ Hybrid Hash Join (cont)

Next phase of hybrid hash join with m=1 (partitioning S):

[Diagram:Pics/join/hyb-hash2.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [31/35]
❖ Hybrid Hash Join (cont)

Final phase of hybrid hash join with m=1 (finishing join):

[Diagram:Pics/join/hyb-hash3.png]

COMP9315 24T1 ♢ Lectures Part F ♢ [32/35]
❖ Hybrid Hash Join (cont)

Some observations:

Best-cost scenario: Other notes:
COMP9315 24T1 ♢ Lectures Part F ♢ [33/35]
❖ Join Summary

No single join algorithm is superior in some overall sense.

Which algorithm is best for a given query depends on:

Choosing the "best" join algorithm is critical because the cost difference between best and worst case can be very large.

E.g.   Join[id=stude](Student,Enrolled):   3,000 ... 2,000,000

COMP9315 24T1 ♢ Lectures Part F ♢ [34/35]
❖ Join in PostgreSQL

Join implementations are under: src/backend/executor

PostgreSQL suports three kinds of join:

Query optimiser chooses appropriate join, by considering
COMP9315 24T1 ♢ Lectures Part F ♢ [35/35]


Produced: 30 Apr 2024