COMP3311 Week 8 Tuesday Lecture

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [0/29]
❖ Week 08 Tuesday


In today's lecture ...

Things to do ...

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [1/29]
❖ Relational Design Theory (recap)


Schemas with redundancy are prone to update anomalies

ER → SQL mapping tends to give non-redundant schemas

Functional dependency gives insights into The methods we describe in this section
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [2/29]
❖ Normalisation


Normalisation: branch of relational theory providing design insights.

Makes use of schema normal forms

And normalisation algorithms which

Normalisation draws heavily on the theory of functional dependencies.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [3/29]
❖ Normal Forms

Normalisation theory defines six normal forms (NFs).

NF hierarachy:   5NF 4NF BCNF 3NF 2NF 1NF

1NF allows most redundancy;   5NF allows least redundancy.

We say that "a schema is in xNF", if all of its tables are xNF

For most practical purposes, BCNF (or 3NF) are acceptable NFs.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [4/29]
❖ Normal Forms (cont)


Normal forms:

1NF all attributes have atomic values
we assume this as part of relational model,
so every relation schema is in 1NF
2NF all non-key attributes fully depend on key
(i.e. no partial dependencies)
avoids much redundancy
3NF
BCNF
no attributes dependent on non-key attrs
(i.e. no transitive dependencies)
avoids most remaining redundancy

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [5/29]
❖ Normal Forms (cont)


Boyce-Codd Normal Form (BCNF):


Third Normal Form (3NF):
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [6/29]
❖ Normal Forms (cont)

Normalisation aims to puts a schema into xNF


How normalisation works ...
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [7/29]
❖ Relation Decomposition

The standard transformation technique to remove redundancy:

We accomplish decomposition by

Properties:   R = S ∪ T,   S ∩ T ≠ ∅   and   r(R) = s(S) ⋈ t(T)

May require several decompositions to achieve acceptable NF.

Normalisation algorithms tell us how to choose S and T.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [8/29]
❖ Schema (Re)Design

Consider the following relation for BankLoans:

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-15 1500
Minus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300

This schema has all of the update anomalies mentioned earlier.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [9/29]
❖ Schema (Re)Design (cont)

To improve the design, decompose the BankLoans relation.

The following decomposition is not helpful:

    Branch(branchName, branchCity, assets)
    CustLoan(custName, loanNo, amount)

because we lose information (which branch is a loan held at?)

Another possible decomposition:

    BranchCust(branchName, branchCity, assets, custName)
    CustLoan(custName, loanNo, amount)

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [10/29]
❖ Schema (Re)Design (cont)

The BranchCust relation instance:

branchName branchCity assets custName
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Minus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
North Town Rye 3700000 Hayes

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [11/29]
❖ Schema (Re)Design (cont)

The CustLoan relation instance:

custName loanNo amount
Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-15 1500
Jones L-93 500
Turner L-11 900
Hayes L-16 1300

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [12/29]
❖ Schema (Re)Design (cont)

Now consider the result of (BranchCust Join CustLoan)

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Perryridge Horseneck 1700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Jackson L-15 1500
Minus Horseneck 400000 Jones L-93 500
Minus Horseneck 400000 Jones L-17 1000
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300
North Town Rye 3700000 Hayes L-15 1500

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [13/29]
❖ Schema (Re)Design (cont)


This is clearly not a successful decomposition.

The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.

Such a decomposition is called a lossy decomposition.

In a good decomposition, we should be able to reconstruct the original relation exactly:

if R is decomposed into S and T, then S ⋈ T = R

Such a decomposition is called lossless join decomposition.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [14/29]
❖ Boyce-Codd Normal Form

A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:

for all fds X → Y  in F+


A DB schema is in BCNF if all of its relation schemas are in BCNF.

Observations:

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [15/29]
❖ Boyce-Codd Normal Form (cont)


If we transform a schema into BCNF, we are guaranteed:

However, we are not guaranteed:

If we need to preserve dependencies, use 3NF.

A dependency X → Y  is preserved if all of its attributes exist in one table.

Decomposition may e.g. place X  and Y  in separate tables

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [16/29]
❖ BCNF Decomposition

The following algorithm converts an arbitrary schema to BCNF:

Inputs: schema R, set F of fds
Output: set Res of BCNF schemas (tables)

Res = {R};
while (any schema S ∈ Res is not in BCNF) {
    choose any fd X → Y on S that violates BCNF
    Res = (Res-S) ∪ (S-Y) ∪ XY
}

The last step means: make a table from XY; drop Y  from table S

The "choose any" step means that the algorithm is non-deterministic

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [17/29]
❖ BCNF Decomposition (cont)

Example (the BankLoans schema):

BankLoans(branchName, branchCity, assets, custName, loanNo, amount)

Has functional dependencies F

The key for BankLoans is branchName,custName,loanNo
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [18/29]
❖ BCNF Decomposition (cont)

Applying the BCNF algorithm:

(continued on next slide)

Notes:

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [19/29]
❖ BCNF Decomposition (cont)

Applying the BCNF algorithm (cont):

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [20/29]
❖ Exercise: BCNF Decomposition (1)

Consider the schema R and set of fds F

R  =  ABCDEFGH

F  =  { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }

Produce a BCNF decomposition of R.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [21/29]
❖ Exercise: BCNF Decomposition (2)

Recall the following table (based on e.g. a spreadsheet):

P#  | When        | Address    | Notes         | S#   | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob   | ABK754
PG1 | 04/06 11:10 | 47 High St | All ok        | SG44 | Rob   | ABK754
PG4 | 03/07 12:30 | 55 High St | All ok        | SG43 | Dave  | ATS123
PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob   | ABK754
PG1 | 05/07 15:00 | 47 High St | Leaking tap   | SG44 | Rob   | ABK754
PG2 | 13/07 12:00 | 12 High St | All ok        | SG42 | Peter | ATS123
...

Recall the functional dependencies identified previously,

Use these to convert the table to a BCNF schema.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [22/29]
❖ Third Normal Form


A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:

for all fds X → Y in F+

A DB schema is in 3NF if all relation schemas are in 3NF.

The extra condition represents a slight weakening of BCNF requirements.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [23/29]
❖ Third Normal Form (cont)


If we transform a schema into 3NF, we are guaranteed:

However, we are not guaranteed:

Whether to use BCNF or 3NF depends on overall design considerations.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [24/29]
❖ Third Normal Form (cont)

The following algorithm converts an arbitrary schema to 3NF:

Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas

let Fc be a minimal cover for F
Res = {}
for each fd X → Y in Fc {
    if (no schema S ∈ Res contains XY) {
        Res = Res ∪ XY
    }
}
if (no schema S ∈ Res contains a candidate key for R) {
    K = any candidate key for R
    Res = Res ∪ K
}

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [25/29]
❖ Third Normal Form (cont)

Critical step is producing minimal cover Fc for F

A set F of fds is minimal if

Algorithm: right-reduce, left-reduce, eliminate redundant fds
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [26/29]
❖ Exercise: 3NF Decomposition (1)

Consider the schema R and set of fds F

R  =  ABCDEFGH

F  =  Fc  =  { BH → C, A → D, C → E, F → A, E → F, BGH → E }

Produce a 3NF decomposition of R.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [27/29]
❖ Exercise: 3NF Decomposition (2)

Recall the following table (based on e.g. a spreadsheet):


P#  | When        | Address    | Notes         | S#   | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob   | ABK754
PG1 | 04/06 11:10 | 47 High St | All ok        | SG44 | Rob   | ABK754
PG4 | 03/07 12:30 | 55 High St | All ok        | SG43 | Dave  | ATS123
PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob   | ABK754
PG1 | 05/07 15:00 | 47 High St | Leaking tap   | SG44 | Rob   | ABK754
PG2 | 13/07 12:00 | 12 High St | All ok        | SG42 | Peter | ATS123
...

Recall the functional dependencies identified previously,

Use these to convert the table to a 3NF schema.

COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [28/29]
❖ Database Design Methodology

To achieve a "good" database design:

Note: may subsequently need to "denormalise" if the design yields inadequate performance.
COMP3311 22T3 ♢ Week 8 Tuesday Lecture ♢ [29/29]


Produced: 4 Apr 2023