Normalisation

COMP3311 20T3 ♢ Normalisation ♢ [0/15]
❖ Normalisation

Normalisation aims to put a schema into xNF


How normalisation works ...
COMP3311 20T3 ♢ Normalisation ♢ [1/15]
❖ Normalisation (cont)

In practice, BCNF and 3NF are the most important normal forms.


Boyce-Codd Normal Form (BCNF):


Third Normal Form (3NF):
COMP3311 20T3 ♢ Normalisation ♢ [2/15]
❖ Relation Decomposition

The standard transformation technique to remove redundancy:

We accomplish decomposition by

Properties:   R = S ∪ T,   S ∩ T ≠ ∅   and   r(R) = s(S) ⨝ t(T)

May require several decompositions to achieve acceptable NF.

Normalisation algorithms tell us how to choose S and T.

COMP3311 20T3 ♢ Normalisation ♢ [3/15]
❖ Schema (Re)Design

Consider the following relation for BankLoans:

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-15 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300

This schema has all of the update anomalies mentioned earlier.

COMP3311 20T3 ♢ Normalisation ♢ [4/15]
❖ Schema (Re)Design (cont)

To improve the design, decompose the BankLoans relation.

The following decomposition is not helpful:

    Branch(branchName, branchCity, assets)
    CustLoan(custName, loanNo, amount)

because we lose information (which branch is a loan held at?)

Another possible decomposition:

    BranchCust(branchName, branchCity, assets, custName)
    CustLoan(custName, loanNo, amount)

COMP3311 20T3 ♢ Normalisation ♢ [5/15]
❖ Schema (Re)Design (cont)

The BranchCust relation instance:

branchName branchCity assets custName
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Mianus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
North Town Rye 3700000 Hayes

COMP3311 20T3 ♢ Normalisation ♢ [6/15]
❖ Schema (Re)Design (cont)

The CustLoan relation instance:

custName loanNo amount
Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-15 1500
Jones L-93 500
Turner L-11 900
Hayes L-16 1300

COMP3311 20T3 ♢ Normalisation ♢ [7/15]
❖ Schema (Re)Design (cont)

Now consider the result of (BranchCust ⨝ CustLoan)

branchName branchCity assets custName loanNo amount
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Perryridge Horseneck 1700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Jackson L-15 1500
Mianus Horseneck 400000 Jones L-93 500
Mianus Horseneck 400000 Jones L-17 1000
Round Hill Horseneck 8000000 Turner L-11 900
North Town Rye 3700000 Hayes L-16 1300
North Town Rye 3700000 Hayes L-15 1500

COMP3311 20T3 ♢ Normalisation ♢ [8/15]
❖ Schema (Re)Design (cont)

This is clearly not a successful decomposition.

The fact that we ended up with extra tuples was symptomatic of losing some critical "connection" information during the decomposition.

Such a decomposition is called a lossy decomposition.

In a good decomposition, we should be able to reconstruct the original relation exactly:

if R is decomposed into S and T, then S ⨝ T = R

Such a decomposition is called lossless join decomposition.

COMP3311 20T3 ♢ Normalisation ♢ [9/15]
❖ BCNF Normalisation Algorithm

The following algorithm converts an arbitrary schema to BCNF:

Inputs: schema R, set F of fds
Output: set Res of BCNF schemas

Res = {R};
while (any schema S ∈ Res is not in BCNF) {
    choose any fd X → Y on S that violates BCNF
    Res = (Res-S) ∪ (S-Y) ∪ XY
}

The last step means: make a table from XY; drop Y from table S

The "choose any" step means that the algorithm is non-deterministic

COMP3311 20T3 ♢ Normalisation ♢ [10/15]
❖ BCNF Normalisation Example

Recall the BankLoans schema:

BankLoans(branchName, branchCity, assets, custName, loanNo, amount)

Rename to simplify ...

B = branchName, C = branchCity, A = assets, N = CustName, L = loanNo, M = amount

So ... R = BCANLM,   F = { B → CA, L → MN },   key(R) = BL

R is not in BCNF, because B → CA is not a whole key

Decompose into

(continued)
COMP3311 20T3 ♢ Normalisation ♢ [11/15]
❖ BCNF Normalisation Example (cont)

S = BCA  is in BCNF,   only one fd  and it has key on LHS

T = BLNM  is not in BCNF, because L → NM is not a whole key

Decompose into ...

Result:
COMP3311 20T3 ♢ Normalisation ♢ [12/15]
❖ 3NF Normalisation Algorithm

The following algorithm converts an arbitrary schema to 3NF:

Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas

let Fc be a reduced minimal cover for F
Res = {}
for each fd X → Y in Fc {
    if (no schema S ∈ Res contains XY) {
        Res = Res ∪ XY
    }
}
if (no schema S ∈ Res contains a key for R) {
    K = any candidate key for R
    Res = Res ∪ K
}

COMP3311 20T3 ♢ Normalisation ♢ [13/15]
❖ 3NF Normalisation Example

Recall the BankLoans schema:

BankLoans(branchName, branchCity, assets, custName, loanNo, amount)

Rename to simplify ...

R = BCANLM,   F = { B → CA, L → MN },   key(R) = BL

Compute minimal cover =  { B → C, B → A, L → M, L → N }

Reduce minimal cover =  { B → CA, L → MN }

Convert into relations:  S = BCA, T = LNM

No relation has key  BL, so add new table containing key  U = BL

Result is S = BCA, T = LNM, U = BL ... same as BCNF

COMP3311 20T3 ♢ Normalisation ♢ [14/15]
❖ Database Design Methodology

To achieve a "good" database design:

Note: may subsequently need to "denormalise" if the design yields inadequate performance.
COMP3311 20T3 ♢ Normalisation ♢ [15/15]


Produced: 5 Nov 2020