COMP3311 Week 8 Wednesday Lecture
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [0/25]
In today's lecture ...
- Relational Algebra, Query Execution
Things to do ...
- Quiz 5 due by 23:59 Friday 3 November
- Assignment 2 due by 23:59 Wednesday 15 November
- Must understand Python/Psycopg2 by end of week
Things to note ...
- Assignment 2 spec and database now available
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [1/25]
Things to note:
- last minute work ...
vxdb2
overloaded ... jas/tutors overloaded
- send code in email as attachment, not as a screenshot
- make sure
helpers.*
load without error
- don't
create view
s within Python code
- write queries to answer questions like "Are there any ...?"
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [2/25]
Relational algebra: sets of tuples, operations mapping sets to sets
Core relational algebra operations:
- selection: choosing a subset of tuples/rows
- projection: choosing a subset of attributes/columns
- product, join: combining relations
- union, intersection, difference: combining relations
- rename: change names of relations/attributes
Common extensions include:
- aggregation, projection++, division, sort
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [3/25]
❖ Relational Algebra (cont) | |
Have already looked at:
- rename, projection, selection: operations on a single relation
- union, intersection, difference: operations on two compatible relations
More interesting are operations:
- product, join: operations on two "non-campatible" relations
SQL implements all of the above, but with a less "clean" syntax, e.g.
select a,b,c projection
from R Proj[a,b,c](Sel[a>5](R))
where a > 5; selection
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [4/25]
❖ Exercise: Result Size and Schema (i) | |
Given the following:
- union-compatible tables R(A,B,C) (1000 tuples)
and S(D,E,F) (500 tuples)
-
Operation | Usage | Size | Schema |
rename |
ρTr |
|r| |
T |
projection |
πX(r) |
≤ |r| |
R'(X) |
selection |
σCond(r) |
selectivity |
R |
- give result sizes and schemas for the following:
- Rename[T(J,K,L)](r),
Proj[A,B](r),
Proj[B,C](r)
- Sel[A=c](r),
Sel[A≠c](r),
Sel[A≥c](r),
Sel[B=c](r)
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [5/25]
Product combines information from
two relations pairwise on tuples.
r × s = { (t1 : t2) | t1 ∈ r ∧ t2 ∈ s }, where r(R), s(S)
If t1 = (A1...An) and
t2 = (B1...Bn) then
(t1:t2) = (A1...An,B1...Bn)
Note: relations do not have to be union-compatible.
Result size is large: |r × s| = |r|.|s|
Schema: R+S
Algorithmic view:
result = {}
for each tuple t1 in relation r
for each tuple t2 in relation s
result = result ∪ {(t1:t2)}
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [6/25]
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [7/25]
Natural join is a specialised product:
- containing only pairs that match on common attributes
- with one of each pair of common attributes eliminated
Consider relation schemas
R(ABC..JKLM),
S(KLMN..XYZ).
The natural join of relations r(R) and s(S) is defined as:
r ⋈ s =
r Join s =
{
(t1[ABC..J] : t2[K..XYZ]) |
t1 ∈ r ∧ t2 ∈ s ∧ match
}
where match =
t1[K] = t2[K] ∧
t1[L] = t2[L] ∧
t1[M] = t2[M]
Algorithmic view:
result = {}
for each tuple t1 in relation r
for each tuple t2 in relation s
if (matches(t1,t2))
result = result ∪ {combine(t1,t2)}
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [8/25]
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [9/25]
The theta join is a specialised product
containing only pairs that match on a supplied condition C.
r ⋈C s = { (t1 : t2) | t1 ∈ r ∧ t2 ∈ s ∧ C(t1 : t2) },
where r(R),s(S)
Examples: (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)
All attribute names are required to be distinct (cf natural join)
Can be defined in terms of other RA operations:
r ⋈C s = r Join[C] s = Sel[C] ( r × s )
Note that r ⋈true s = r × s.
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [10/25]
Example theta join:
(Theta join is the most frequently-used join in SQL queries)
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [11/25]
❖ Exercise: Result Size and Schema (ii) | |
Given the following:
- union-compatible tables R(A,B,C) (1000 tuples)
and S(C,D,E) (500 tuples)
-
Operation | Usage | MinSize | MaxSize | Schema |
product |
r × s |
|r|.|s| |
|r|.|s| |
R+S |
natural join |
r ⋈ s |
0 |
|r| |
R+S-C |
theta join |
r ⋈Cond s |
0 |
|r|.|s| |
R+S |
- give result sizes and schemas for the following:
- r × s,
Join(r,s),
Join[A=C](r,s),
Join[B=D](r,s),
Join[B>D](r,s)
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [12/25]
❖ Examples of RelAlg Expressions | |
Querying with relational algebra (join) ...
- Who drinks in Newtown bars?
NewtownBars(nbar) = Sel[addr=Newtown](Bars)
Tmp = Frequents Join[bar=nbar] NewtownBars
Result(drinker) = Proj[drinker](Tmp)
- Who drinks beers made by Carlton?
CarltonBeers = Sel[manf=Carlton](Beers)
Tmp = Likes Join[beer=name] CarltonBeers
Result(drinker) = Proj[drinker)Tmp
Reminder: projection removes duplicates
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [13/25]
❖ Exercise: Mapping SQL to RelAlg | |
Give sequences of relational algebra operations to solve each of these:
- Bars where either Gernot or John drink.
- Bars where John drinks but Gernot doesn't
- Find bars that serve New at the same price
as the Coogee Bay Hotel charges for VB.
- What beers are sold at the same price as CBH/New?
- Which bar is most popular? (Most drinkers)
- Price of cheapest beer at each bar?
- Which beers are sold at all bars?
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [14/25]
Consider two relation schemas R and S where S ⊂ R.
The division operation is defined on instances r(R), s(S) as:
r / s =
r Div s =
{
t |
t ∈ r[R-S] ∧ satisfy }
where satisfy = ∀ ts ∈ S ( ∃ tr ∈ R ( tr[S] = ts ∧ tr[R-S] = t ) )
Operationally:
- consider each subset of tuples in R that match on t[R-S]
- for this subset of tuples, take the t[S] values from each
- if this covers all tuples in S, then include t[R-S] in the result
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [15/25]
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [16/25]
Querying with relational algebra (division) ...
Division handles queries that include the notion "for all".
E.g. Which beers are sold in all bars?
We can answer this as follows:
- generate a relation of beers and bars where they are sold
- r1 = Proj[beer,bar](Sold)
- generate a relation of all bars
- r2 = Rename[r2(bar)](Proj[name](Bars))
- find which beers appear in tuples with every bar
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [17/25]
COMP3311 is not a course on DBMS Architecture (that's COMP9315)
But knowing just a little about how DBMSs work can help
- to avoid/fix inefficiencies in database applications
- ensure that there are no concurrency issues
DBMSs attempt to handle these issues in ..
- query processing (QP) .. methods for evaluating queries
- transaction processing (TxP) ... controlling concurrency
As a programmer, you give a lot of control to the DBMS, but can
- use QP knowledge to make DB applications efficient
- use TxP knowledge to make DB applications safe
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [18/25]
❖ DBMS Architecture (cont) | |
Our view of the DBMS so far ...
A machine to process SQL queries.
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [19/25]
❖ DBMS Architecture (cont) | |
One view of DB engine: "relational algebra virtual machine"
selection (σ) |
| projection (π) |
| join (⋈, ×) |
union (∪) |
| intersection (∩) |
| difference (-) |
sort |
| insert |
| delete |
For each of these operations:
- various data structures and algorithms are available
- DBMSs may provide only one, or may provide a choice
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [20/25]
❖ DBMS Architecture (cont) | |
Layers in a DB Engine (Ramakrishnan's View)
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [21/25]
The path of a query through its evaluation:
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [22/25]
Mapping SQL to relational algebra, e.g.
select a as x
from R join S on (b=c)
where d = 100
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a) = Proj[a](Tmp2)
Res(x) = Rename[Res(x)](Tmp3)
In general:
-
SELECT
clause becomes projection
-
WHERE
condition becomes selection or join
-
FROM
clause becomes join
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [23/25]
❖ Exercise: A Better SQL to RA Mapping | |
On the previous slide, we translated an SQL query as follows:
select a as x
from R join S on (b=c)
where d = 100
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a) = Proj[a](Tmp2)
Res(x) = Rename[Res(x)](Tmp3)
Suggest a more efficient approach (based on likely size of intermediate results)
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [24/25]
The query: Courses with more than 100 students in them?
Can be expressed in SQL as
select s.id, s.code
from Course c, Subject s, Enrolment e
where c.id = e.course and c.subject = s.id
group by s.id, s.code
having count(*) > 100;
and might be compiled to
Result =
Project[id,code](
GroupSelect[size>100] (
GroupBy[id,code] (
Subject Join[s.id=c.subject]
(Course Join[c.id=e.course] Enrolment)
) ) )
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [25/25]
Produced: 1 Nov 2023