❖ Sort-Merge Join |
Basic approach:
(r,s)❖ Sort-Merge Join (cont) |
Standard merging requires two cursors:
while (r != eof && s != eof) {
if (r.val ≤ s.val) { output(r.val); next(r); }
else { output(s.val); next(s); }
}
while (r != eof) { output(r.val); next(r); }
while (s != eof) { output(s.val); next(s); }
❖ Sort-Merge Join (cont) |
Merging for join requires 3 cursors to scan sorted relations:
rsss
❖ Sort-Merge Join (cont) |
Algorithm using query iterators/scanners:
Query ri, si; Tuple r,s;
ri = startScan("SortedR");
si = startScan("SortedS");
while ((r = nextTuple(ri)) != NULL
&& (s = nextTuple(si)) != NULL) {
// align cursors to start of next common run
while (r != NULL && r.i < s.j)
r = nextTuple(ri);
if (r == NULL) break;
while (s != NULL && r.i > s.j)
s = nextTuple(si);
if (s == NULL) break;
// must have (r.i == s.j) here
...
❖ Sort-Merge Join (cont) |
...
// remember start of current run in S
TupleID startRun = scanCurrent(si)
// scan common run, generating result tuples
while (r != NULL && r.i == s.j) {
while (s != NULL and s.j == r.i) {
addTuple(outbuf, combine(r,s));
if (isFull(outbuf)) {
writePage(outf, outp++, outbuf);
clearBuf(outbuf);
}
s = nextTuple(si);
}
r = nextTuple(ri);
setScan(si, startRun);
}
}
❖ Sort-Merge Join (cont) |
Buffer requirements:
❖ Sort-Merge Join (cont) |
Cost of sort-merge join.
Step 1: sort each relation (if not already sorted):
❖ Sort-Merge Join on Example |
SQL query on student/enrolment database:
select E.subj, S.name from Student S join Enrolled E on (S.id = E.stude) order by E.subj
And its relational algebra equivalent:
Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )
Database: rS = 20000, cS = 20, bS = 1000, rE = 80000, cE = 40, bE = 2000
We are interested only in the cost of Join, with N buffers
❖ Sort-Merge Join on Example (cont) |
Case 1: Join[id=stude](Student,Enrolled)
| Cost | = | sort(S) + sort(E) + bS + bE |
| = | 2bS(1+log31(bS/32)) + 2bE(1+log31(bE/32)) + bS + bE | |
| = | 2×1000×(1+2) + 2×2000×(1+2) + 1000 + 2000 | |
| = | 6000 + 12000 + 1000 + 2000 | |
| = | 21,000 |
❖ Sort-Merge Join on Example (cont) |
Case 2: Join[id=stude](Student,Enrolled)
Cost = 2,000 + 1,000 = 3,000 (regardless of which relation is outer)
Produced: 1 May 2021