❖ Queries on Multiple Tables |
Queries involving a single table are useful.
Exploiting all data in the DB requires
select b.brewer from Beers b join Likes L on (b.name = L.beer) where L.drinker = 'John';
Info on brewers is in BeersLikes
Need to combine info from both tables using "common" attributes
❖ Queries on Multiple Tables (cont) |
Example BeersLikes
Beers(80/-, Caledonian, Scotch Ale) Likes(John, Sculpin) Beers(New, Toohey's, Lager) Likes(John, Red Nut) Beers(Red Nut, Bentspoke, Red IPA) Likes(Adam, New) Beers(Sculpin, Ballast Point, IPA) Likes(John, 80/-)
"Merged" tuples resulting from
Beers b join Likes L on (b.name = L.beer)Joined(80/-, Caledonian, Scotch Ale, John, 80/-) Joined(New, Toohey's, Lager, Adam, New) Joined(Red Nut, Bentspoke, Red IPA, John, Red Nut) Joined(Sculpin, Ballast Point, IPA, John, Sculpin)
In the query, the where
❖ Join |
Join is the SQL operator that combines tuples from tables.
Such an important operation that several variations exist
❖ Join (cont) |
Join fits into SELECT
SELECT Attributes
FROM R1
JOIN R2 ON (JoinCondition1)
JOIN R3 ON (JoinCondition2)
...
WHERE Condition
Can include an arbitrary number of joins.
WHERE
❖ Join (cont) |
Alternative syntax for joins:
SELECT brewer
FROM Likes L, Beers b
WHERE L.beer = b.name
AND L.drinker = 'John';
Join condition(s) are specified in the WHERE
We prefer the explicit JOIN
Note: duplicates could be eliminated by using distinct
❖ Join (cont) |
Operational semantics of R1 JOIN R2 ON (Condition)
FOR EACH tuple t1 in R1 DO
FOR EACH tuple t2 in R2 DO
check Condition for current
t1, t2 attribute values
IF Condition holds THEN
add (t1,t2) to result
END
END
END
Easy to generalise: add more relations, include WHERE
Requires one tuple variable for each relation, and nested loops over relations.
But this is not how it's actually computed!
❖ Name Clashes in Conditions |
If a SELECT
Example: Which hotels have the same name as a beer?
SELECT Bars.name
FROM Bars, Beers
WHERE Bars.name = Beers.name;
-- or, using table aliases ...
SELECT r.name
FROM Bars r, Beers b
WHERE r.name = b.name
❖ Explicit Tuple Variables |
Table-dot-attribute doesn't help if we use same table twice in SELECT
To handle this, define new names for each "instance" of the table
SELECT r1.a, r2.b FROM R r1, R r2 WHERE r1.a = r2.a
Example: Find pairs of beers by the same manufacturer.
SELECT b1.name, b2.name FROM Beers b1 JOIN Beers b2 ON (b1.brewer = b2.brewer) WHERE b1.name < b2.name;
The WHERE
(New,New)(New,Old) (Old,New)
❖ Outer Join |
Join only produces a result tuple from tR and
tS where ...
SELECT * FROM R JOIN S WHERE (Condition)
Sometimes, we want a result for every R
RS❖ Outer Join (cont) |
Example: for each suburb with a bar, find out who drinks there.
Theta-join only gives results for suburbs where people drink.
SELECT B.addr, F.drinker
FROM Bars B
JOIN Frequents F ON (F.bar = B.name)
ORDER BY addr;
addr | drinker
-----------+---------
Coogee | Adam
Coogee | John
Kingsford | Justin
Sydney | Justin
The Rocks | John
But what if we want all suburbs, even if some have are no drinkers?
This is from an older and simpler instance of the beers database.
❖ Outer Join (cont) |
An outer join solves this problem.
For R OUTER JOINON ()
NULLLEFT OUTER JOIN❖ Outer Join (cont) |
Solving the example query with an outer join:
SELECT B.addr, F.drinker
FROM Bars B
LEFT OUTER JOIN Frequents F on (F.bar = B.name)
ORDER BY B.addr;
addr | drinker
------------+---------
Coogee | Adam
Coogee | John
Kingsford | Justin
Randwick |
Sydney | Justin
The Rocks | John
Note that Randwick is now mentioned (because of the Royal Hotel).
❖ Outer Join (cont) |
Operational semantics of R1 LEFT OUTER JOIN R2 ON (Cond)
FOR EACH tuple t1 in R1 DO
nmatches = 0
FOR EACH tuple t2 in R2 DO
check Cond for current
t1, t2 attribute values
IF Cond holds THEN
nmatches++
add (t1,t2) to result
END
END
IF nmatches == 0 THEN
t2 = (null,null,null,...)
add (t1,t2) to result
END
END
❖ Outer Join (cont) |
Many RDBMSs provide three variants of outer join:
LEFT OUTER JOINRIGHT OUTER JOINNULLFULL OUTER JOINNULL❖ Subqueries |
The result of a query can be used in the
WHERE
Case 1: Subquery returns a single, unary tuple
SELECT * FROM R WHERE R.a = (SELECT S.x FROM S WHERE Cond1)
Case 2: Subquery returns multiple values
SELECT * FROM R WHERE R.a IN (SELECT S.x FROM S WHERE Cond2)
This approach is often used in the initial discussion of SQL in some textbooks.
These kinds of queries can generally be solved more efficiently using a join
SELECT * FROM R JOIN S ON (R.a = S.x) WHERE Cond
Produced: 3 Oct 2020