COMP3311 Week 8 Monday Lecture

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [0/49]
❖ Week 08 Monday


In today's lecture ...

Things to do ...

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [1/49]
❖ Assignment 2 Database

Things to note:

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [2/49]
❖ Assignment 2 Scripts

Write Python/Psycopg2 scripts to ...

Will merge Q5 and Q6 ... essentially the same script

python3 q5.py zID   vs   python3 q5.py zID Prog Stream

Since output is text, need to be extra careful with output format

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [3/49]
❖ Normalisation


Normalisation: branch of relational theory providing design insights.

Makes use of schema normal forms

And normalisation algorithms which

Normalisation draws heavily on the theory of functional dependencies.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [4/49]
❖ Relation Decomposition

The standard transformation technique to remove redundancy:

We accomplish decomposition by

Properties:   R = S ∪ T,   S ∩ T ≠ ∅   and   r(R) = s(S) ⋈ t(T)

May require several decompositions to achieve acceptable NF.

Normalisation algorithms tell us how to choose S and T.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [5/49]
❖ Schema (Re)Design

Consider the following relation for BankLoans:

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-15 1500
Minus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300

This schema has all of the update anomalies mentioned earlier.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [6/49]
❖ Schema (Re)Design (cont)

To improve the design, decompose the BankLoans relation.

The following decomposition is not helpful:

    Branch(branchName, branchCity, assets)
    CustLoan(custName, loanNo, amount)

because we lose information (which branch is a loan held at?)

Another possible decomposition:

    BranchCust(branchName, branchCity, assets, custName)
    CustLoan(custName, loanNo, amount)

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [7/49]
❖ Schema (Re)Design (cont)

The BranchCust relation instance:

branchName branchCity assets custName
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Minus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
North Town Rye 3700000 Hayes

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [8/49]
❖ Schema (Re)Design (cont)

The CustLoan relation instance:

custName loanNo amount
Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-15 1500
Jones L-93 500
Turner L-11 900
Hayes L-16 1300

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [9/49]
❖ Schema (Re)Design (cont)

Now consider the result of (BranchCust Join CustLoan)

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Perryridge Horseneck 1700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Jackson L-15 1500
Minus Horseneck 400000 Jones L-93 500
Minus Horseneck 400000 Jones L-17 1000
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300
North Town Rye 3700000 Hayes L-15 1500

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [10/49]
❖ Schema (Re)Design (cont)


This is clearly not a successful decomposition.

The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.

Such a decomposition is called a lossy decomposition.

In a good decomposition, we should be able to reconstruct the original relation exactly:

if R is decomposed into S and T, then S ⋈ T = R

Such a decomposition is called lossless join decomposition.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [11/49]
❖ Boyce-Codd Normal Form

A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:

for all fds X → Y  in F+


A DB schema is in BCNF if all of its relation schemas are in BCNF.

Observations:

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [12/49]
❖ Boyce-Codd Normal Form (cont)


If we transform a schema into BCNF, we are guaranteed:

However, we are not guaranteed:

If we need to preserve dependencies, use 3NF.

A dependency X → Y  is preserved if all of its attributes exist in one table.

Decomposition may e.g. place X  and Y  in separate tables

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [13/49]
❖ BCNF Decomposition

The following algorithm converts an arbitrary schema to BCNF:

Inputs: schema R, set F of fds
Output: set Res of BCNF schemas (tables)

Res = {R};
while (any schema S ∈ Res is not in BCNF) {
    choose any fd X → Y on S that violates BCNF
    Res = (Res-S) ∪ (S-Y) ∪ XY
}

The last step means: make a table from XY; drop Y  from table S; drop table  S

The "choose any" step means that the algorithm is non-deterministic

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [14/49]
❖ Exercise: BCNF Normalisation (1)

Recall the BankLoans schema:

BankLoans(branchName, branchCity, assets, custName, loanNo, amount)

Has functional dependencies F

The key for BankLoans is branchName,custName,loanNo

Use the BCNF decompositon algorithm to produce a BCNF schema.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [15/49]
❖ Exercise: BCNF Normalisation (2)

Recall the following table (based on e.g. a spreadsheet):

P#  | When        | Address    | Notes         | S#   | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob   | ABK754
PG1 | 04/06 11:10 | 47 High St | All ok        | SG44 | Rob   | ABK754
PG4 | 03/07 12:30 | 55 High St | All ok        | SG43 | Dave  | ATS123
PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob   | ABK754
PG1 | 05/07 15:00 | 47 High St | Leaking tap   | SG44 | Rob   | ABK754
PG2 | 13/07 12:00 | 12 High St | All ok        | SG42 | Peter | ATS123
...

Recall the functional dependencies identified previously,

Use these to convert the table to a BCNF schema.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [16/49]
❖ Exercise: BCNF Normalisation (3)

Consider the schema R and set of fds F

R  =  ABCDEFGH

F  =  { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }

Produce a BCNF decomposition of R.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [17/49]
❖ Third Normal Form


A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:

for all fds X → Y in F+

A DB schema is in 3NF if all relation schemas are in 3NF.

The extra condition represents a slight weakening of BCNF requirements.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [18/49]
❖ Third Normal Form (cont)


If we transform a schema into 3NF, we are guaranteed:

However, we are not guaranteed:

Whether to use BCNF or 3NF depends on overall design considerations.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [19/49]
❖ Third Normal Form (cont)

The following algorithm converts an arbitrary schema to 3NF:

Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas

let Fc be a minimal cover for F
Res = {}
for each fd X → Y in Fc {
    if (no schema S ∈ Res contains XY) {
        Res = Res ∪ XY
    }
}
if (no schema S ∈ Res contains a candidate key for R) {
    K = any candidate key for R
    Res = Res ∪ K
}

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [20/49]
❖ Third Normal Form (cont)

Critical step is producing minimal cover Fc for F

A set F of fds is minimal if

Algorithm: right-reduce, left-reduce, eliminate redundant fds
COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [21/49]
❖ Exercise: BCNF Normalisation (1)

Recall the BankLoans schema:

BankLoans(branchName, branchCity, assets, custName, loanNo, amount)

Has functional dependencies F

The key for BankLoans is branchName,custName,loanNo

Use the 3NF decompositon algorithm to produce a 3NF schema.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [22/49]
❖ Exercise: 3NF Normalisation (2)

Recall the following table (based on e.g. a spreadsheet):


P#  | When        | Address    | Notes         | S#   | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob   | ABK754
PG1 | 04/06 11:10 | 47 High St | All ok        | SG44 | Rob   | ABK754
PG4 | 03/07 12:30 | 55 High St | All ok        | SG43 | Dave  | ATS123
PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob   | ABK754
PG1 | 05/07 15:00 | 47 High St | Leaking tap   | SG44 | Rob   | ABK754
PG2 | 13/07 12:00 | 12 High St | All ok        | SG42 | Peter | ATS123
...

Recall the functional dependencies identified previously,

Use these to convert the table to a 3NF schema.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [23/49]
❖ Exercise: 3NF Normalisation (3)

Consider the schema R and set of fds F

R  =  ABCDEFGH

F  =  Fc  =  { BH → C, A → D, C → E, F → A, E → F, BGH → E }

Produce a 3NF decomposition of R.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [24/49]
❖ Database Design Methodology

To achieve a "good" database design:

Note: may subsequently need to "denormalise" if the design yields inadequate performance.
COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [25/49]
❖ Relational Algebra

Relational algebra (RA) can be viewed as ...

Relational algebra consists of: Why is it important?
COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [26/49]
❖ Relational Algebra (cont)


Core relational algebra operations:

Common extensions include:
COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [27/49]
❖ Relational Algebra (cont)

Select, project, join provide a powerful set of operations for building relations and extracting interesting data from them.

[Diagram:Pics/relalg/spj1.png]

Adding set operations and renaming makes RA complete.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [28/49]
❖ Notation

Standard treatments of relational algebra use Greek symbols.

We use the following notation (because it is easier to reproduce):

Operation Standard
Notation
Our
Notation
Selection σexpr(Rel) Sel[expr](Rel)
Projection πA,B,C(Rel) Proj[A,B,C](Rel)
Join Rel1expr Rel2 Rel1  Join[expr]  Rel2
Rename ρschemaRel Rename[schema](Rel)

For other operations (e.g. set operations) we adopt the standard notation.
Except when typing in a text file, where * = intersection, + = union

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [29/49]
❖ Describing RA Operations

We define the semantics of RA operations using

For each operation, we also describe it operationally:
COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [30/49]
❖ Describing RA Operations (cont)

All RA operators return a result of type relation.

For convenience, we can name a result and use it later.

E.g.

Temp = R op1 S op2 T
Res  = Temp op3 Z
-- which is equivalent to
Res  = (R op1 S op2 T) op3 Z


Each result is a relation with a well-defined schema

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [31/49]
❖ Example Database #1

[Diagram:Pics/relalg/db1.png]

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [32/49]
❖ Example Database #2

[Diagram:Pics/relalg/db2a.png]

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [33/49]
❖ Rename

Rename provides "schema mapping".

If expression E returns a relation R(A1, A2, ... An), then

Rename[S(B1, B2, ... Bn)](E)

gives a relation called S

Rename is like the identity function on the contents of a relation

The only thing that Rename changes is the schema.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [34/49]
❖ Rename (cont)

Rename can be viewed as a "technical" apparatus of RA.

Can use implicit rename/project in sequences of RA operations, e.g.

--  R(a,b,c),  S(c,d)
Res = Rename[Res(b,c,d)](Project[b,c](Sel[a>5](R)) Join S)
-- vs
Tmp1 = Select[a>5](R)
Tmp2 = Project[b,c](Tmp1)
Tmp3 = Rename[Tmp3(cc,d)](S)
Tmp4 = Tmp2 Join[c=cc] Tmp3
Res  = Rename[Res(b,c,d)](Tmp4)
-- vs
Tmp1(b,c)  = Select[a>5](R)
Tmp2(cc,d) = S
Res(b,c,d) = Tmp1 Join[c=cc] Tmp2

In SQL, can achieve a similar effect by defining a set of views

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [35/49]
❖ Exercise: Rename


Answer the following in SQL then relational algebra

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [36/49]
❖ Projection

Projection returns a set of tuples containing a subset of the attributes in the original relation.

πX(r)   =   Proj[X](r)   =   { t[X]  |  t ∈ r },     where r(R)

X specifies a subset of the attributes of R.

Note that removing key attributes can produce duplicates.

In RA, duplicates are removed from the result set.

Result size:   |πX(r)| |r|     Result schema:   R'(X)

Algorithmic view:

result = {}
for each tuple t in relation r
    result = result ∪ {t[X]}

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [37/49]
❖ Projection (cont)

Examples of projection:

[Diagram:Pics/relalg/projection2.png]

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [38/49]
❖ Exercise: Projection

Answer the following in SQL then relational algebra

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [39/49]
❖ Selection

Selection returns a subset of the tuples in a relation r(R) that satisfy a specified condition C.

σC(r)   =   Sel[C](r)   =   { t  |  t ∈ r ∧ C(t) }

C is a boolean expression on attributes in R.

Result size:   |σC(r)| |r|

Result schema:   same as the schema of r   (i.e. R)

Algorithmic view:

result = {}
for each tuple t in relation r
    if (C(t)) { result = result ∪ {t} }

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [40/49]
❖ Selection (cont)

Examples of selection:

[Diagram:Pics/relalg/selection2.png]

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [41/49]
❖ Exercise: Selection


Answer the following in SQL and relational algebra

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [42/49]
❖ Union

Union combines two compatible relations into a single relation via set union of sets of tuples.

r1 ∪ r2   =   { t  |  t ∈ r1 ∨ t ∈ r2 },     where r1(R), r2(R)

Compatibility = both relations have the same schema

Result size:   |r1 ∪ r2|     |r1| + |r2|     Result schema: R

Algorithmic view:

result = r1
for each tuple t in relation r2
    result = result ∪ {t}

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [43/49]
❖ Intersection

Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.

r1 ∩ r2   =   { t  |  t ∈ r1 ∧ t ∈ r2 },     where r1(R), r2(R)

Uses same notion of relation compatibility as union.

Result size:   |r1 ∪ r2|     min(|r1|,|r2|)     Result schema: R

Algorithmic view:

result = {}
for each tuple t in relation r1
    if (t ∈ r2) { result = result ∪ {t} }

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [44/49]
❖ Intersection (cont)

Examples of union and intersection:

[Diagram:Pics/relalg/setops.png]

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [45/49]
❖ Exercise: Union/Intersection


Answer the following in SQL then relational algebra

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [46/49]
❖ Difference

Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.

r1 - r2   =   { t  |  t ∈ r1 ∧ t ∉ r2 },     where r1(R), r2(R)

Uses same notion of relation compatibility as union.

Note: tuples in r2 but not r1 do not appear in the result

Algorithmic view:

result = {}
for each tuple t in relation r1
    if (!(t ∈ r2)) { result = result ∪ {t} }

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [47/49]
❖ Difference (cont)

Examples of difference:

[Diagram:Pics/relalg/difference2.png]

 

Clearly, difference is not symmetric.

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [48/49]
❖ Difference (cont)


Answer the following in SQL then relational algebra

COMP3311 23T3 ♢ Week 8 Monday Lecture ♢ [49/49]


Produced: 30 Oct 2023