COMP3311 Week 8 Thursday Lecture

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [0/53]
❖ Week 08 Thursday

In today's lecture ...

Things to do ...

Things to note ...

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [1/53]
❖ Assignment 2


Things to note:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [2/53]
❖ Relational Algebra

Relational algebra (RA) can be viewed as ...

Relational algebra consists of: Why is it important?
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [3/53]
❖ Relational Algebra (cont)


Core relational algebra operations:

Common extensions include:
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [4/53]
❖ Relational Algebra (cont)

Select, project, join provide a powerful set of operations for building relations and extracting interesting data from them.

[Diagram:Pics/relalg/spj1.png]

Adding set operations and renaming makes RA complete.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [5/53]
❖ Notation

Standard treatments of relational algebra use Greek symbols.

We use the following notation (because it is easier to reproduce):

Operation Standard
Notation
Our
Notation
Selection σexpr(Rel) Sel[expr](Rel)
Projection πA,B,C(Rel) Proj[A,B,C](Rel)
Join Rel1expr Rel2 Rel1  Join[expr]  Rel2
Rename ρschemaRel Rename[schema](Rel)

For other operations (e.g. set operations) we adopt the standard notation.
Except when typing in a text file, where * = intersection, + = union

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [6/53]
❖ Describing RA Operations

We define the semantics of RA operations using

For each operation, we also describe it operationally:
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [7/53]
❖ Describing RA Operations (cont)

All RA operators return a result of type relation.

For convenience, we can name a result and use it later.

E.g.

Temp = R op1 S op2 T
Res  = Temp op3 Z
-- which is equivalent to
Res  = (R op1 S op2 T) op3 Z


Each result is a relation with a well-defined schema

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [8/53]
❖ Example Database #1

[Diagram:Pics/relalg/db1.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [9/53]
❖ Example Database #2

[Diagram:Pics/relalg/db2a.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [10/53]
❖ Rename

Rename provides "schema mapping".

If expression E returns a relation R(A1, A2, ... An), then

Rename[S(B1, B2, ... Bn)](E)

gives a relation called S

Rename is like the identity function on the contents of a relation

The only thing that Rename changes is the schema.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [11/53]
❖ Rename (cont)

Rename can be viewed as a "technical" apparatus of RA.

Can use implicit rename/project in sequences of RA operations, e.g.

--  R(a,b,c),  S(c,d)
Res = Rename[Res(b,c,d)](Project[b,c](Sel[a>5](R)) Join S)
-- vs
Tmp1 = Select[a>5](R)
Tmp2 = Project[b,c](Tmp1)
Tmp3 = Rename[Tmp3(cc,d)](S)
Tmp4 = Tmp2 Join[c=cc] Tmp3
Res  = Rename[Res(b,c,d)](Tmp4)
-- vs
Tmp1(b,c)  = Select[a>5](R)
Tmp2(cc,d) = S
Res(b,c,d) = Tmp1 Join[c=cc] Tmp2

In SQL, can achieve a similar effect by defining a set of views

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [12/53]
❖ Exercise: Rename


Answer the following in SQL then relational algebra

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [13/53]
❖ Projection

Projection returns a set of tuples containing a subset of the attributes in the original relation.

πX(r)   =   Proj[X](r)   =   { t[X]  |  t ∈ r },     where r(R)

X specifies a subset of the attributes of R.

Note that removing key attributes can produce duplicates.

In RA, duplicates are removed from the result set.

Result size:   |πX(r)| |r|     Result schema:   R'(X)

Algorithmic view:

result = {}
for each tuple t in relation r
    result = result ∪ {t[X]}

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [14/53]
❖ Projection (cont)

Examples of projection:

[Diagram:Pics/relalg/projection2.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [15/53]
❖ Exercise: Projection

Answer the following in SQL then relational algebra

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [16/53]
❖ Selection

Selection returns a subset of the tuples in a relation r(R) that satisfy a specified condition C.

σC(r)   =   Sel[C](r)   =   { t  |  t ∈ r ∧ C(t) }

C is a boolean expression on attributes in R.

Result size:   |σC(r)| |r|

Result schema:   same as the schema of r   (i.e. R)

Algorithmic view:

result = {}
for each tuple t in relation r
    if (C(t)) { result = result ∪ {t} }

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [17/53]
❖ Selection (cont)

Examples of selection:

[Diagram:Pics/relalg/selection2.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [18/53]
❖ Exercise: Selection


Answer the following in SQL and relational algebra

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [19/53]
❖ Union

Union combines two compatible relations into a single relation via set union of sets of tuples.

r1 ∪ r2   =   { t  |  t ∈ r1 ∨ t ∈ r2 },     where r1(R), r2(R)

Compatibility = both relations have the same schema

Result size:   |r1 ∪ r2|     |r1| + |r2|     Result schema: R

Algorithmic view:

result = r1
for each tuple t in relation r2
    result = result ∪ {t}

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [20/53]
❖ Intersection

Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.

r1 ∩ r2   =   { t  |  t ∈ r1 ∧ t ∈ r2 },     where r1(R), r2(R)

Uses same notion of relation compatibility as union.

Result size:   |r1 ∪ r2|     min(|r1|,|r2|)     Result schema: R

Algorithmic view:

result = {}
for each tuple t in relation r1
    if (t ∈ r2) { result = result ∪ {t} }

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [21/53]
❖ Intersection (cont)

Examples of union and intersection:

[Diagram:Pics/relalg/setops.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [22/53]
❖ Exercise: Union/Intersection


Answer the following in SQL then relational algebra

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [23/53]
❖ Difference

Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.

r1 - r2   =   { t  |  t ∈ r1 ∧ t ∉ r2 },     where r1(R), r2(R)

Uses same notion of relation compatibility as union.

Note: tuples in r2 but not r1 do not appear in the result

Algorithmic view:

result = {}
for each tuple t in relation r1
    if (!(t ∈ r2)) { result = result ∪ {t} }

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [24/53]
❖ Difference (cont)

Examples of difference:

[Diagram:Pics/relalg/difference2.png]

 

Clearly, difference is not symmetric.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [25/53]
❖ Difference (cont)


Answer the following in SQL then relational algebra

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [26/53]
❖ Product

Product (Cartesian product) combines information from two relations pairwise on tuples.

r × s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s },     where r(R), s(S)

Each tuple in the result contains all attributes from r and s, possibly with some fields renamed to avoid ambiguity.

If t1 = (A1...An) and t2 = (B1...Bn) then (t1:t2) = (A1...An,B1...Bn)

Note: relations do not have to be union-compatible.

Result size is large:   |r × s| = |r|.|s|     Schema: R∪S

Algorithmic view:

result = {}
for each tuple t1 in relation r
    for each tuple t2 in relation s
        result = result ∪ {(t1:t2)}

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [27/53]
❖ Product (cont)

Example of product:


[Diagram:Pics/relalg/product.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [28/53]
❖ Natural Join

Natural join is a specialised product:

Consider relation schemas R(ABC..JKLM), S(KLMN..XYZ).

The natural join of relations r(R) and s(S) is defined as:

r ⋈ s   =   r Join s   =  
{ (t1[ABC..J] : t2[K..XYZ])  |  t1 ∈ r ∧ t2 ∈ s ∧ match }

where    match   =   t1[K] = t2[K] ∧ t1[L] = t2[L] ∧ t1[M] = t2[M]

Algorithmic view:

result = {}
for each tuple t1 in relation r
   for each tuple t2 in relation s
      if (matches(t1,t2))
         result = result ∪ {combine(t1,t2)}

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [29/53]
❖ Natural Join (cont)

Example of natural join:


[Diagram:Pics/relalg/nat-join.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [30/53]
❖ Theta Join

The theta join is a specialised product containing only pairs that match on a supplied condition C.

r ⋈C s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s ∧ C(t1 : t2) },
where r(R),s(S)

Examples:   (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)

All attribute names are required to be distinct (cf natural join)

Can be defined in terms of other RA operations:

r ⋈C s   =   r Join[C] s   =   Sel[C] ( r × s )

Note that   r ⋈true s   =   r × s.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [31/53]
❖ Theta Join (cont)

Example theta join:

[Diagram:Pics/relalg/join2.png]

(Theta join is the most frequently-used join in SQL queries)

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [32/53]
❖ Theta Join (cont)

Querying with relational algebra (join) ...


Reminder: projection removes duplicates

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [33/53]
❖ Exercise: Mapping SQL to RelAlg


Give sequences of relational algebra operations to solve each of these:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [34/53]
❖ Division

Consider two relation schemas R and S where S ⊂ R.

The division operation is defined on instances r(R), s(S) as:

r / s   =   r Div s   =   { t | t ∈ r[R-S] ∧ satisfy }

where   satisfy   =   ∀ ts ∈ S ( ∃ tr ∈ R ( tr[S] = ts ∧ tr[R-S] = t ) )

Operationally:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [35/53]
❖ Division (cont)

Example of division:


[Diagram:Pics/relalg/division2.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [36/53]
❖ Division (cont)

Querying with relational algebra (division) ...

Division handles queries that include the notion "for all".

E.g. Which beers are sold in all bars?

We can answer this as follows:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [37/53]
❖ DBMS Architecture

COMP3311 is not a course on DBMS Architecture (that's COMP9315)

But knowing just a little about how DBMSs work can help

DBMSs attempt to handle these issues in .. As a programmer, you give a lot of control to the DBMS, but can
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [38/53]
❖ DBMS Architecture (cont)

Our view of the DBMS so far ...


[Diagram:Pics/dbms/qryeval0.png]


A machine to process SQL queries.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [39/53]
❖ DBMS Architecture (cont)

One view of DB engine: "relational algebra virtual machine"

selection (σ) projection (π) join (⋈, ×)
union () intersection () difference (-)
sort insert delete

For each of these operations:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [40/53]
❖ DBMS Architecture (cont)

Layers in a DB Engine (Ramakrishnan's View)


[Diagram:Pics/dbms/dbms-layers.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [41/53]
❖ Query Evaluation

The path of a query through its evaluation:


[Diagram:Pics/dbms/qryeval.png]

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [42/53]
❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

In general:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [43/53]
❖ Exercise: A Better SQL to RA Mapping

On the previous slide, we translated an SQL query as follows:

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

Suggest a more efficient approach   (based on likely size of intermediate results)

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [44/53]
❖ Mapping Example

The query: Courses with more than 100 students in them?

Can be expressed in SQL as

select   s.id, s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id, s.code
having   count(*) > 100;

and might be compiled to

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [45/53]
❖ Exercise: Relational Operation Plan

So far, we have been expressing RA evaluation as follows:

Tmp(a,b,c,...) =  Op[x](R)  or  R Op[x] S

Each statement involves a single relational algebra operation.

Render the RA from the previous slide in this form.


Reminder:

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [46/53]
❖ Query Cost Estimation

The cost of evaluating a query is determined by

Analysis of costs involves estimating:
Accessing data from disk is the dominant cost in query evaluation
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [47/53]
❖ Query Cost Estimation (cont)

An execution plan is a sequence of relational operations.

Consider execution plans for:    σc (R  ⋈d  S  ⋈e  T)

tmp1   :=  hash_join[d](R,S)
tmp2   :=  sort_merge_join[e](tmp1,T)
result :=  binary_search[c](tmp2)

or

tmp1   :=  sort_merge_join[e](S,T)
tmp2   :=  hash_join[d](R,tmp1)
result :=  linear_search[c](tmp2)

or

tmp1   :=  btree_search[c](R)
tmp2   :=  hash_join[d](tmp1,S)
result :=  sort_merge_join[e](tmp2)

All produce same result, but have different costs.

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [48/53]
❖ Implementations of RA Ops

Sorting   (quicksort, etc. are not applicable)

Selection   (different techniques developed for different query types) Join   (fast joins are critical to success of relational DBMSs)
COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [49/53]
❖ Query Optimisation

What is the "best" method for evaluating a query?

Generally, best = lowest cost = fastest evaluation time

Cost is measured in terms of pages read/written

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [50/53]
❖ Query Optimisation (cont)

A DBMS query optimiser works as follows:

Input: relational algebra expression
Output: execution plan (sequence of RA ops)

bestCost = INF; bestPlan = none
while (more possible plans) {
   plan = produce a new evaluation plan
   cost = estimated_cost(plan)
   if (cost < bestCost) {
      bestCost = cost; bestPlan = plan
   }
}
return bestPlan

Typically, there are very many possible plans

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [51/53]
❖ DB Application Performance

In order to make DB applications efficient, it is useful to know:

and then, "encourage" DBMS to use the most efficient methods

Achieve by using indexes and avoiding certain SQL query structures

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [52/53]
❖ DB Application Performance (cont)

Application programmer choices that affect query cost:

COMP3311 22T3 ♢ Week 8 Thursday Lecture ♢ [53/53]


Produced: 6 Apr 2023