❖ Join |
DBMSs are engines to store, combine and filter information.
Join (⨝) is the primary means of combining information.
Join is important and potentially expensive
Most common join condition: equijoin, e.g. (R.pk = S.fk)
Join varieties (natural, inner, outer, semi, anti) all behave similarly.
We consider three strategies for implementing join
❖ Join Example |
Consider a university database with the schema:
create table Student( id integer primary key, name text, ... ); create table Enrolled( stude integer references Student(id), subj text references Subject(code), ... ); create table Subject( code text primary key, title text, ... );
We use this example for each join implementation, by way of comparison
❖ Join Example (cont) |
Goal: List names of students in all subjects, arranged by subject.
SQL query to provide this information:
select E.subj, S.name
from Student S
join Enrolled E on (S.id = E.stude)
order by E.subj, S.name;
And its relational algebra equivalent:
StudentEnrolled❖ Join Example (cont) |
Some database statistics:
| Sym | Meaning | Value |
| rS | # student records | 20,000 |
| rE | # enrollment records | 80,000 |
| cS | Student |
20 |
| cE | Enrolled |
40 |
| bS | # data pages in Student |
1,000 |
| bE | # data pages in Enrolled |
2,000 |
Also, in cost analyses later, N = number of memory buffers.
❖ Join Example (cont) |
Relation statistics for Out
| Sym | Meaning | Value |
| rOut | # tuples in result | 80,000 |
| COut | result records/page | 80 |
| bOut | # data pages in result | 1,000 |
Notes:
Enrolledsubjname❖ Implementing Join |
A naive join implementation strategy
for each tuple TS in Students {
for each tuple TE in Enrolled {
if (testJoinCondition(C,TS,TE)) {
T1 = concat(TS,TE)
T2 = project([subj,name],T1)
ResultSet = ResultSet ∪ {T2}
} } }
Problems:
❖ Implementing Join (cont) |
An alternative naive join implementation strategy
for each tuple TE in Enrolled {
for each tuple TS in Students {
if (testJoinCondition(C,TS,TE)) {
T1 = concat(TS,TE)
T2 = project([subj,name],T1)
ResultSet = ResultSet ∪ {T2}
} } }
Relatively minor performance difference ...
❖ Join Summary |
None of nested-loop/sort-merge/hash join is superior in some overall sense.
Which strategy is best for a given query depends on:
E.g. Join[id=stude](Student,Enrolled): 3,000 ... 2,000,000 page reads
❖ Join in PostgreSQL |
Join implementations are under: src/backend/executor
PostgreSQL suports the three join methods that we discuss:
nodeNestloop.cnodeMergejoin.cnodeHashjoin.cProduced: 22 Mar 2021