❖ Normalisation |
Normalisation aims to put a schema into xNF
❖ Normalisation (cont) |
In practice, BCNF and 3NF are the most important normal forms.
Boyce-Codd Normal Form (BCNF):
❖ Relation Decomposition |
The standard transformation technique to remove redundancy:
We accomplish decomposition by
May require several decompositions to achieve acceptable NF.
Normalisation algorithms tell us how to choose S and T.
❖ Schema (Re)Design |
Consider the following relation for BankLoans:
| branchName | branchCity | assets | custName | loanNo | amount |
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
| Mianus | Horseneck | 400000 | Jones | L-93 | 500 |
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
This schema has all of the update anomalies mentioned earlier.
❖ Schema (Re)Design (cont) |
To improve the design, decompose the BankLoans relation.
The following decomposition is not helpful:
Branch(branchName, branchCity, assets)
CustLoan(custName, loanNo, amount)
because we lose information (which branch is a loan held at?)
Another possible decomposition:
BranchCust(branchName, branchCity, assets, custName)
CustLoan(custName, loanNo, amount)
❖ Schema (Re)Design (cont) |
The BranchCust relation instance:
| branchName | branchCity | assets | custName |
| Downtown | Brooklyn | 9000000 | Jones |
| Redwood | Palo Alto | 2100000 | Smith |
| Perryridge | Horseneck | 1700000 | Hayes |
| Downtown | Brooklyn | 9000000 | Jackson |
| Mianus | Horseneck | 400000 | Jones |
| Round Hill | Horseneck | 8000000 | Turner |
| North Town | Rye | 3700000 | Hayes |
❖ Schema (Re)Design (cont) |
The CustLoan relation instance:
| custName | loanNo | amount |
| Jones | L-17 | 1000 |
| Smith | L-23 | 2000 |
| Hayes | L-15 | 1500 |
| Jackson | L-15 | 1500 |
| Jones | L-93 | 500 |
| Turner | L-11 | 900 |
| Hayes | L-16 | 1300 |
❖ Schema (Re)Design (cont) |
Now consider the result of (BranchCust ⨝ CustLoan)
| branchName | branchCity | assets | custName | loanNo | amount |
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
| Downtown | Brooklyn | 9000000 | Jones | L-93 | 500 |
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
| Perryridge | Horseneck | 1700000 | Hayes | L-16 | 1300 |
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
| Mianus | Horseneck | 400000 | Jones | L-93 | 500 |
| Mianus | Horseneck | 400000 | Jones | L-17 | 1000 |
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
| North Town | Rye | 3700000 | Hayes | L-15 | 1500 |
❖ Schema (Re)Design (cont) |
This is clearly not a successful decomposition.
The fact that we ended up with extra tuples was symptomatic of losing some critical "connection" information during the decomposition.
Such a decomposition is called a lossy decomposition.
In a good decomposition, we should be able to reconstruct the original relation exactly:
Such a decomposition is called lossless join decomposition.
❖ BCNF Normalisation Algorithm |
The following algorithm converts an arbitrary schema to BCNF:
Inputs: schema R, set F of fds
Output: set Res of BCNF schemas
Res = {R};
while (any schema S ∈ Res is not in BCNF) {
choose any fd X → Y on S that violates BCNF
Res = (Res-S) ∪ (S-Y) ∪ XY
}
The last step means: make a table from XY; drop Y from table S
The "choose any" step means that the algorithm is non-deterministic
❖ BCNF Normalisation Example |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Rename to simplify ...
B = branchName, C = branchCity, A = assets, N = CustName, L = loanNo, M = amount
So ... R = BCANLM, F = { B → CA, L → MN }, key(R) = BL
R is not in BCNF, because B → CA is not a whole key
Decompose into
❖ BCNF Normalisation Example (cont) |
S = BCA is in BCNF, only one fd and it has key on LHS
T = BLNM is not in BCNF, because L → NM is not a whole key
Decompose into ...
❖ 3NF Normalisation Algorithm |
The following algorithm converts an arbitrary schema to 3NF:
Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas
let Fc be a reduced minimal cover for F
Res = {}
for each fd X → Y in Fc {
if (no schema S ∈ Res contains XY) {
Res = Res ∪ XY
}
}
if (no schema S ∈ Res contains a key for R) {
K = any candidate key for R
Res = Res ∪ K
}
❖ 3NF Normalisation Example |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Rename to simplify ...
R = BCANLM, F = { B → CA, L → MN }, key(R) = BL
Compute minimal cover = { B → C, B → A, L → M, L → N }
Reduce minimal cover = { B → CA, L → MN }
Convert into relations: S = BCA, T = LNM
No relation has key BL, so add new table containing key U = BL
Result is S = BCA, T = LNM, U = BL ... same as BCNF
❖ Database Design Methodology |
To achieve a "good" database design:
Produced: 5 Nov 2020