COMP3311 Week 8 Wednesday Lecture

>>

Week 08 Wednesday
Assignment 2
Relational Algebra
Exercise: Result Size and Schema (i)
Product
Natural Join
Theta Join
Exercise: Result Size and Schema (ii)
Examples of RelAlg Expressions
Exercise: Mapping SQL to RelAlg
Division
DBMS Architecture
Query Evaluation
Mapping SQL to RA
Exercise: A Better SQL to RA Mapping
Mapping Example

∧ >>

❖ Week 08 Wednesday

In today's lecture ...

Relational Algebra, Query Execution

Things to do ...

Quiz 5 due by 23:59 Friday 3 November
Assignment 2 due by 23:59 Wednesday 15 November
Must understand Python/Psycopg2 by end of week

Things to note ...

Assignment 2 spec and database now available

<< ∧ >>

❖ Assignment 2

Things to note:

last minute work ... vxdb2 overloaded ... jas/tutors overloaded
send code in email as attachment, not as a screenshot
make sure helpers.* load without error
don't create views within Python code
write queries to answer questions like "Are there any ...?"

<< ∧ >>

❖ Relational Algebra

Relational algebra: sets of tuples, operations mapping sets to sets

Core relational algebra operations:

selection: choosing a subset of tuples/rows
projection: choosing a subset of attributes/columns
product, join: combining relations
union, intersection, difference: combining relations
rename: change names of relations/attributes

Common extensions include:

aggregation, projection++, division, sort

<< ∧ >>

❖ Relational Algebra (cont)

Have already looked at:

rename, projection, selection: operations on a single relation
union, intersection, difference: operations on two compatible relations

More interesting are operations:

product, join: operations on two "non-campatible" relations

SQL implements all of the above, but with a less "clean" syntax, e.g.

select a,b,c     projection
from   R                    Proj[a,b,c](Sel[a>5](R))
where  a > 5;    selection

<< ∧ >>

❖ Exercise: Result Size and Schema (i)

Given the following:

union-compatible tables R(A,B,C) (1000 tuples) and S(D,E,F) (500 tuples)
Operation Usage Size Schema

rename ρ_Tr |r| T

projection π_X(r) ≤ |r| R'(X)
selection σ_Cond(r) selectivity R
give result sizes and schemas for the following:
- Rename[T(J,K,L)](r), Proj[A,B](r), Proj[B,C](r)
- Sel[A=c](r), Sel[A≠c](r), Sel[A≥c](r), Sel[B=c](r)

<< ∧ >>

❖ Product

Product combines information from two relations pairwise on tuples.

r × s = { (t₁ : t₂) | t₁ ∈ r ∧ t₂ ∈ s }, where r(R), s(S)

If t₁ = (A₁...A_n) and t₂ = (B₁...B_n) then (t₁:t₂) = (A₁...A_n,B₁...B_n)

Note: relations do not have to be union-compatible.

Result size is large: |r × s| = |r|.|s| Schema: R+S

Algorithmic view:

result = {}
for each tuple t₁ in relation r
    for each tuple t₂ in relation s
        result = result ∪ {(t₁:t₂)}

<< ∧ >>

❖ Product (cont)

Example of product:

<< ∧ >>

❖ Natural Join

Natural join is a specialised product:

containing only pairs that match on common attributes
with one of each pair of common attributes eliminated

Consider relation schemas R(ABC..JKLM), S(KLMN..XYZ).

The natural join of relations r(R) and s(S) is defined as:

r ⋈ s = r Join s = { (t₁[ABC..J] : t₂[K..XYZ]) | t₁ ∈ r ∧ t₂ ∈ s ∧ match }

where match = t₁[K] = t₂[K] ∧ t₁[L] = t₂[L] ∧ t₁[M] = t₂[M]

Algorithmic view:

result = {}
for each tuple t₁ in relation r
   for each tuple t₂ in relation s
      if (matches(t₁,t₂))
         result = result ∪ {combine(t₁,t₂)}

<< ∧ >>

❖ Natural Join (cont)

Example of natural join:

<< ∧ >>

❖ Theta Join

The theta join is a specialised product containing only pairs that match on a supplied condition C.

r ⋈_C s = { (t₁ : t₂) | t₁ ∈ r ∧ t₂ ∈ s ∧ C(t₁ : t₂) },
where r(R),s(S)

Examples: (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)

All attribute names are required to be distinct (cf natural join)

Can be defined in terms of other RA operations:

r ⋈_C s = r Join[C] s = Sel[C] ( r × s )

Note that r ⋈_true s = r × s.

<< ∧ >>

❖ Theta Join (cont)

Example theta join:

(Theta join is the most frequently-used join in SQL queries)

<< ∧ >>

❖ Exercise: Result Size and Schema (ii)

Given the following:

union-compatible tables R(A,B,C) (1000 tuples) and S(C,D,E) (500 tuples)
Operation Usage MinSize MaxSize Schema

product r × s |r|.|s| |r|.|s| R+S

natural join r ⋈ s 0 |r| R+S-C
theta join r ⋈_Cond s 0 |r|.|s| R+S
give result sizes and schemas for the following:
- r × s, Join(r,s), Join[A=C](r,s), Join[B=D](r,s), Join[B>D](r,s)

<< ∧ >>

❖ Examples of RelAlg Expressions

Querying with relational algebra (join) ...

Who drinks in Newtown bars?

NewtownBars(nbar) = Sel[addr=Newtown](Bars)
Tmp = Frequents Join[bar=nbar] NewtownBars
Result(drinker) = Proj[drinker](Tmp)

Who drinks beers made by Carlton?

CarltonBeers = Sel[manf=Carlton](Beers)
Tmp = Likes Join[beer=name] CarltonBeers
Result(drinker) = Proj[drinker)Tmp

Reminder: projection removes duplicates

<< ∧ >>

❖ Exercise: Mapping SQL to RelAlg

Give sequences of relational algebra operations to solve each of these:

Bars where either Gernot or John drink.
Bars where John drinks but Gernot doesn't
Find bars that serve New at the same price
as the Coogee Bay Hotel charges for VB.
What beers are sold at the same price as CBH/New?
Which bar is most popular? (Most drinkers)
Price of cheapest beer at each bar?
Which beers are sold at all bars?

<< ∧ >>

❖ Division

Consider two relation schemas R and S where S ⊂ R.

The division operation is defined on instances r(R), s(S) as:

r / s = r Div s = { t | t ∈ r[R-S] ∧ satisfy }

where satisfy = ∀ t_s ∈ S ( ∃ t_r ∈ R ( t_r[S] = t_s ∧ t_r[R-S] = t ) )

Operationally:

consider each subset of tuples in R that match on t[R-S]
for this subset of tuples, take the t[S] values from each
if this covers all tuples in S, then include t[R-S] in the result

<< ∧ >>

❖ Division (cont)

Example of division:

<< ∧ >>

❖ Division (cont)

Querying with relational algebra (division) ...

Division handles queries that include the notion "for all".

E.g. Which beers are sold in all bars?

We can answer this as follows:

generate a relation of beers and bars where they are sold
- r1 = Proj[beer,bar](Sold)
generate a relation of all bars
- r2 = Rename[r2(bar)](Proj[name](Bars))
find which beers appear in tuples with every bar
- res = r1 Div r2

<< ∧ >>

❖ DBMS Architecture

COMP3311 is not a course on DBMS Architecture (that's COMP9315)

But knowing just a little about how DBMSs work can help

to avoid/fix inefficiencies in database applications
ensure that there are no concurrency issues

DBMSs attempt to handle these issues in ..

query processing (QP) .. methods for evaluating queries
transaction processing (TxP) ... controlling concurrency

As a programmer, you give a lot of control to the DBMS, but can

use QP knowledge to make DB applications efficient
use TxP knowledge to make DB applications safe

<< ∧ >>

❖ DBMS Architecture (cont)

Our view of the DBMS so far ...

A machine to process SQL queries.

<< ∧ >>

❖ DBMS Architecture (cont)

One view of DB engine: "relational algebra virtual machine"

selection (σ)	projection (π)	join (⋈, ×)
union (∪)	intersection (∩)	difference (-)
sort	insert	delete

For each of these operations:

various data structures and algorithms are available
DBMSs may provide only one, or may provide a choice

<< ∧ >>

❖ DBMS Architecture (cont)

Layers in a DB Engine (Ramakrishnan's View)

<< ∧ >>

❖ Query Evaluation

The path of a query through its evaluation:

<< ∧ >>

❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

In general:

SELECT clause becomes projection
WHERE condition becomes selection or join
FROM clause becomes join

<< ∧ >>

❖ Exercise: A Better SQL to RA Mapping

On the previous slide, we translated an SQL query as follows:

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

Suggest a more efficient approach (based on likely size of intermediate results)

<< ∧

❖ Mapping Example

The query: Courses with more than 100 students in them?

Can be expressed in SQL as

select   s.id, s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id, s.code
having   count(*) > 100;

and might be compiled to

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

Operation	Usage	Size	Schema
rename	ρ_Tr	\|r\|	T
projection	π_X(r)	≤ \|r\|	R'(X)
selection	σ_Cond(r)	selectivity	R

Operation	Usage	MinSize	MaxSize	Schema
product	r × s	\|r\|.\|s\|	\|r\|.\|s\|	R+S
natural join	r ⋈ s	0	\|r\|	R+S-C
theta join	r ⋈_Cond s	0	\|r\|.\|s\|	R+S