❖ Queries on Multiple Tables |
Queries involving a single table are useful.
Exploiting all data in the DB requires
select b.brewer from Beers b join Likes L on (b.name = L.beer) where L.drinker = 'John';
Info on brewers is in Beers
Likes
Need to combine info from both tables using "common" attributes
❖ Queries on Multiple Tables (cont) |
Example Beers
Likes
Beers(80/-, Caledonian, Scotch Ale) Likes(John, Sculpin) Beers(New, Toohey's, Lager) Likes(John, Red Nut) Beers(Red Nut, Bentspoke, Red IPA) Likes(Adam, New) Beers(Sculpin, Ballast Point, IPA) Likes(John, 80/-)
"Merged" tuples resulting from
Beers b join Likes L on (b.name = L.beer)
Joined(80/-, Caledonian, Scotch Ale, John, 80/-) Joined(New, Toohey's, Lager, Adam, New) Joined(Red Nut, Bentspoke, Red IPA, John, Red Nut) Joined(Sculpin, Ballast Point, IPA, John, Sculpin)
In the query, the where
❖ Join |
Join is the SQL operator that combines tuples from tables.
Such an important operation that several variations exist
❖ Join (cont) |
Join fits into SELECT
SELECT Attributes FROM R1 JOIN R2 ON (JoinCondition1) JOIN R3 ON (JoinCondition2) ... WHERE Condition
Can include an arbitrary number of joins.
WHERE
❖ Join (cont) |
Alternative syntax for joins:
SELECT brewer FROM Likes L, Beers b WHERE L.beer = b.name AND L.drinker = 'John';
Join condition(s) are specified in the WHERE
We prefer the explicit JOIN
Note: duplicates could be eliminated by using distinct
❖ Join (cont) |
Operational semantics of R1 JOIN R2 ON (Condition)
FOR EACH tuple t1 in R1 DO FOR EACH tuple t2 in R2 DO check Condition for current t1, t2 attribute values IF Condition holds THEN add (t1,t2) to result END END END
Easy to generalise: add more relations, include WHERE
Requires one tuple variable for each relation, and nested loops over relations.
But this is not how it's actually computed!
❖ Name Clashes in Conditions |
If a SELECT
Example: Which hotels have the same name as a beer?
SELECT Bars.name
FROM Bars, Beers
WHERE Bars.name = Beers.name;
-- or, using table aliases ...
SELECT r.name
FROM Bars r, Beers b
WHERE r.name = b.name
❖ Explicit Tuple Variables |
Table-dot-attribute doesn't help if we use same table twice in SELECT
To handle this, define new names for each "instance" of the table
SELECT r1.a, r2.b FROM R r1, R r2 WHERE r1.a = r2.a
Example: Find pairs of beers by the same manufacturer.
SELECT b1.name, b2.name FROM Beers b1 JOIN Beers b2 ON (b1.brewer = b2.brewer) WHERE b1.name < b2.name;
The WHERE
(New,New)
(New,Old) (Old,New)
❖ Outer Join |
Join only produces a result tuple from tR and
tS where ...
SELECT * FROM R JOIN S WHERE (Condition)
Sometimes, we want a result for every R
R
S
❖ Outer Join (cont) |
Example: for each suburb with a bar, find out who drinks there.
Theta-join only gives results for suburbs where people drink.
SELECT B.addr, F.drinker FROM Bars B JOIN Frequents F ON (F.bar = B.name) ORDER BY addr; addr | drinker -----------+--------- Coogee | Adam Coogee | John Kingsford | Justin Sydney | Justin The Rocks | John
But what if we want all suburbs, even if some have are no drinkers?
This is from an older and simpler instance of the beers database.
❖ Outer Join (cont) |
An outer join solves this problem.
For R OUTER JOIN
ON (
)
NULL
LEFT OUTER JOIN
❖ Outer Join (cont) |
Solving the example query with an outer join:
SELECT B.addr, F.drinker FROM Bars BFrequents F on (F.bar = B.name) ORDER BY B.addr; addr | drinker ------------+--------- Coogee | Adam Coogee | John Kingsford | Justin Randwick | Sydney | Justin The Rocks | John LEFT OUTER JOIN
Note that Randwick is now mentioned (because of the Royal Hotel).
❖ Outer Join (cont) |
Operational semantics of R1 LEFT OUTER JOIN R2 ON (Cond)
FOR EACH tuple t1 in R1 DO nmatches = 0 FOR EACH tuple t2 in R2 DO check Cond for current t1, t2 attribute values IF Cond holds THEN nmatches++ add (t1,t2) to result END END IF nmatches == 0 THEN t2 = (null,null,null,...) add (t1,t2) to result END END
❖ Outer Join (cont) |
Many RDBMSs provide three variants of outer join:
LEFT OUTER JOIN
RIGHT OUTER JOIN
NULL
FULL OUTER JOIN
NULL
❖ Subqueries |
The result of a query can be used in the
WHERE
Case 1: Subquery returns a single, unary tuple
SELECT * FROM R WHERE R.a = (SELECT S.x FROM S WHERE Cond1)
Case 2: Subquery returns multiple values
SELECT * FROM R WHERE R.a IN (SELECT S.x FROM S WHERE Cond2)
This approach is often used in the initial discussion of SQL in some textbooks.
These kinds of queries can generally be solved more efficiently using a join
SELECT * FROM R JOIN S ON (R.a = S.x) WHERE Cond
Produced: 3 Oct 2020