❖ Week 08 Tuesday |
❖ Relational Design Theory (recap) |
Schemas with redundancy are prone to update anomalies
ER → SQL mapping tends to give non-redundant schemas
❖ Normalisation |
Normalisation: branch of relational theory providing design insights.
Makes use of schema normal forms
Normalisation draws heavily on the theory of functional dependencies.
❖ Normal Forms |
Normalisation theory defines six normal forms (NFs).
1NF allows most redundancy; 5NF allows least redundancy.
We say that "a schema is in xNF", if all of its tables are xNF
For most practical purposes, BCNF (or 3NF) are acceptable NFs.
❖ Normal Forms (cont) |
Normal forms:
all attributes have atomic values we assume this as part of relational model, so every relation schema is in 1NF |
||
all non-key attributes fully depend on key (i.e. no partial dependencies) avoids much redundancy |
||
BCNF |
no attributes dependent on non-key attrs (i.e. no transitive dependencies) avoids most remaining redundancy |
❖ Normal Forms (cont) |
Boyce-Codd Normal Form (BCNF):
(which isn't necessarily a problem, unless FDs capture important semantics)
❖ Normal Forms (cont) |
Normalisation aims to puts a schema into xNF
❖ Relation Decomposition |
The standard transformation technique to remove redundancy:
We accomplish decomposition by
May require several decompositions to achieve acceptable NF.
Normalisation algorithms tell us how to choose S and T.
❖ Schema (Re)Design |
Consider the following relation for BankLoans:
branchName | branchCity | assets | custName | loanNo | amount |
Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
Minus | Horseneck | 400000 | Jones | L-93 | 500 |
Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
This schema has all of the update anomalies mentioned earlier.
❖ Schema (Re)Design (cont) |
To improve the design, decompose the BankLoans relation.
The following decomposition is not helpful:
Branch(branchName, branchCity, assets)
CustLoan(custName, loanNo, amount)
because we lose information (which branch is a loan held at?)
Another possible decomposition:
BranchCust(branchName, branchCity, assets, custName)
CustLoan(custName, loanNo, amount)
❖ Schema (Re)Design (cont) |
The BranchCust relation instance:
branchName | branchCity | assets | custName |
Downtown | Brooklyn | 9000000 | Jones |
Redwood | Palo Alto | 2100000 | Smith |
Perryridge | Horseneck | 1700000 | Hayes |
Downtown | Brooklyn | 9000000 | Jackson |
Minus | Horseneck | 400000 | Jones |
Round Hill | Horseneck | 8000000 | Turner |
North Town | Rye | 3700000 | Hayes |
❖ Schema (Re)Design (cont) |
The CustLoan relation instance:
custName | loanNo | amount |
Jones | L-17 | 1000 |
Smith | L-23 | 2000 |
Hayes | L-15 | 1500 |
Jackson | L-15 | 1500 |
Jones | L-93 | 500 |
Turner | L-11 | 900 |
Hayes | L-16 | 1300 |
❖ Schema (Re)Design (cont) |
Now consider the result of (BranchCust Join CustLoan)
branchName | branchCity | assets | custName | loanNo | amount |
Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
Downtown | Brooklyn | 9000000 | Jones | L-93 | 500 |
Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
Perryridge | Horseneck | 1700000 | Hayes | L-16 | 1300 |
Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
Minus | Horseneck | 400000 | Jones | L-93 | 500 |
Minus | Horseneck | 400000 | Jones | L-17 | 1000 |
Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
North Town | Rye | 3700000 | Hayes | L-15 | 1500 |
❖ Schema (Re)Design (cont) |
This is clearly not a successful decomposition.
The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.
Such a decomposition is called a lossy decomposition.
In a good decomposition, we should be able to reconstruct the original relation exactly:
Such a decomposition is called lossless join decomposition.
❖ Boyce-Codd Normal Form |
A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
Observations:
❖ Boyce-Codd Normal Form (cont) |
If we transform a schema into BCNF, we are guaranteed:
If we need to preserve dependencies, use 3NF.
A dependency X → Y is preserved if all of its attributes exist in one table.
Decomposition may e.g. place X and Y in separate tables
❖ BCNF Decomposition |
The following algorithm converts an arbitrary schema to BCNF:
Inputs: schema R, set F of fds Output: set Res of BCNF schemas (tables) Res = {R}; while (any schema S ∈ Res is not in BCNF) { choose any fd X → Y on S that violates BCNF Res = (Res-S) ∪ (S-Y) ∪ XY }
The last step means: make a table from XY; drop Y from table S
The "choose any" step means that the algorithm is non-deterministic
❖ BCNF Decomposition (cont) |
Example (the BankLoans schema):
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
❖ BCNF Decomposition (cont) |
Applying the BCNF algorithm:
Branch(branchName, branchCity, assets)
LoanInfo(branchName, custName, loanNo, amount)
Notes:
❖ BCNF Decomposition (cont) |
Applying the BCNF algorithm (cont):
Loan(branchName, loanNo, amount)
Borrower(custName, loanNo)
❖ Exercise: BCNF Decomposition (1) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
Produce a BCNF decomposition of R.
❖ Exercise: BCNF Decomposition (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a BCNF schema.
❖ Third Normal Form |
A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
The extra condition represents a slight weakening of BCNF requirements.
❖ Third Normal Form (cont) |
If we transform a schema into 3NF, we are guaranteed:
Whether to use BCNF or 3NF depends on overall design considerations.
❖ Third Normal Form (cont) |
The following algorithm converts an arbitrary schema to 3NF:
Inputs: schema R, set F of fds Output: set Ri of 3NF schemas let Fc be a minimal cover for F Res = {} for each fd X → Y in Fc { if (no schema S ∈ Res contains XY) { Res = Res ∪ XY } } if (no schema S ∈ Res contains a candidate key for R) { K = any candidate key for R Res = Res ∪ K }
❖ Third Normal Form (cont) |
Critical step is producing minimal cover Fc for F
A set F of fds is minimal if
❖ Exercise: 3NF Decomposition (1) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = Fc = { BH → C, A → D, C → E, F → A, E → F, BGH → E }
Produce a 3NF decomposition of R.
❖ Exercise: 3NF Decomposition (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a 3NF schema.
❖ Database Design Methodology |
To achieve a "good" database design:
Produced: 4 Apr 2023