COMP3311 Week 8 Wednesday Lecture

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [0/25]
❖ Week 08 Wednesday

In today's lecture ...

Things to do ...

Things to note ...

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [1/25]
❖ Assignment 2


Things to note:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [2/25]
❖ Relational Algebra

Relational algebra: sets of tuples, operations mapping sets to sets

Core relational algebra operations:

Common extensions include:
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [3/25]
❖ Relational Algebra (cont)

Have already looked at:

More interesting are operations: SQL implements all of the above, but with a less "clean" syntax, e.g.

select a,b,c     projection
from   R                    Proj[a,b,c](Sel[a>5](R))
where  a > 5;    selection

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [4/25]
❖ Exercise: Result Size and Schema (i)

Given the following:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [5/25]
❖ Product

Product combines information from two relations pairwise on tuples.

r × s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s },     where r(R), s(S)

If t1 = (A1...An) and t2 = (B1...Bn) then (t1:t2) = (A1...An,B1...Bn)

Note: relations do not have to be union-compatible.

Result size is large:   |r × s| = |r|.|s|     Schema: R+S

Algorithmic view:

result = {}
for each tuple t1 in relation r
    for each tuple t2 in relation s
        result = result ∪ {(t1:t2)}

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [6/25]
❖ Product (cont)

Example of product:


[Diagram:Pics/relalg/product.png]

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [7/25]
❖ Natural Join

Natural join is a specialised product:

Consider relation schemas R(ABC..JKLM), S(KLMN..XYZ).

The natural join of relations r(R) and s(S) is defined as:

r ⋈ s   =   r Join s  =  { (t1[ABC..J] : t2[K..XYZ])  |  t1 ∈ r ∧ t2 ∈ s ∧ match }

where    match   =   t1[K] = t2[K] ∧ t1[L] = t2[L] ∧ t1[M] = t2[M]

Algorithmic view:

result = {}
for each tuple t1 in relation r
   for each tuple t2 in relation s
      if (matches(t1,t2))
         result = result ∪ {combine(t1,t2)}

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [8/25]
❖ Natural Join (cont)

Example of natural join:


[Diagram:Pics/relalg/nat-join.png]

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [9/25]
❖ Theta Join

The theta join is a specialised product containing only pairs that match on a supplied condition C.

r ⋈C s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s ∧ C(t1 : t2) },
where r(R),s(S)

Examples:   (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)

All attribute names are required to be distinct (cf natural join)

Can be defined in terms of other RA operations:

r ⋈C s   =   r Join[C] s   =   Sel[C] ( r × s )

Note that   r ⋈true s   =   r × s.

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [10/25]
❖ Theta Join (cont)

Example theta join:

[Diagram:Pics/relalg/join2.png]

(Theta join is the most frequently-used join in SQL queries)

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [11/25]
❖ Exercise: Result Size and Schema (ii)

Given the following:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [12/25]
❖ Examples of RelAlg Expressions

Querying with relational algebra (join) ...


Reminder: projection removes duplicates

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [13/25]
❖ Exercise: Mapping SQL to RelAlg


Give sequences of relational algebra operations to solve each of these:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [14/25]
❖ Division

Consider two relation schemas R and S where S ⊂ R.

The division operation is defined on instances r(R), s(S) as:

r / s   =   r Div s   =   { t | t ∈ r[R-S] ∧ satisfy }

where   satisfy   =   ∀ ts ∈ S ( ∃ tr ∈ R ( tr[S] = ts ∧ tr[R-S] = t ) )

Operationally:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [15/25]
❖ Division (cont)

Example of division:


[Diagram:Pics/relalg/division2.png]

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [16/25]
❖ Division (cont)

Querying with relational algebra (division) ...

Division handles queries that include the notion "for all".

E.g. Which beers are sold in all bars?

We can answer this as follows:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [17/25]
❖ DBMS Architecture

COMP3311 is not a course on DBMS Architecture (that's COMP9315)

But knowing just a little about how DBMSs work can help

DBMSs attempt to handle these issues in .. As a programmer, you give a lot of control to the DBMS, but can
COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [18/25]
❖ DBMS Architecture (cont)

Our view of the DBMS so far ...


[Diagram:Pics/dbms/qryeval0.png]


A machine to process SQL queries.

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [19/25]
❖ DBMS Architecture (cont)

One view of DB engine: "relational algebra virtual machine"

selection (σ) projection (π) join (⋈, ×)
union () intersection () difference (-)
sort insert delete

For each of these operations:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [20/25]
❖ DBMS Architecture (cont)

Layers in a DB Engine (Ramakrishnan's View)


[Diagram:Pics/dbms/dbms-layers.png]

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [21/25]
❖ Query Evaluation

The path of a query through its evaluation:


[Diagram:Pics/dbms/qryeval.png]

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [22/25]
❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

In general:

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [23/25]
❖ Exercise: A Better SQL to RA Mapping

On the previous slide, we translated an SQL query as follows:

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

Suggest a more efficient approach   (based on likely size of intermediate results)

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [24/25]
❖ Mapping Example

The query: Courses with more than 100 students in them?

Can be expressed in SQL as

select   s.id, s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id, s.code
having   count(*) > 100;

and might be compiled to

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

COMP3311 23T3 ♢ Week 8 Wednesday Lecture ♢ [25/25]


Produced: 1 Nov 2023