❖ Week 08 Monday |
❖ Assignment 2 Database |
Things to note:
NOT NULL
❖ Assignment 2 Scripts |
Write Python/Psycopg2 scripts to ...
q1.py
q2.py
q3.py
q4.py
q5.py
q6.py
python3 q5.py zID vs python3 q5.py zID Prog Stream
Since output is text, need to be extra careful with output format
❖ Normalisation |
Normalisation: branch of relational theory providing design insights.
Makes use of schema normal forms
Normalisation draws heavily on the theory of functional dependencies.
❖ Relation Decomposition |
The standard transformation technique to remove redundancy:
We accomplish decomposition by
May require several decompositions to achieve acceptable NF.
Normalisation algorithms tell us how to choose S and T.
❖ Schema (Re)Design |
Consider the following relation for BankLoans:
branchName | branchCity | assets | custName | loanNo | amount |
Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
Minus | Horseneck | 400000 | Jones | L-93 | 500 |
Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
This schema has all of the update anomalies mentioned earlier.
❖ Schema (Re)Design (cont) |
To improve the design, decompose the BankLoans relation.
The following decomposition is not helpful:
Branch(branchName, branchCity, assets)
CustLoan(custName, loanNo, amount)
because we lose information (which branch is a loan held at?)
Another possible decomposition:
BranchCust(branchName, branchCity, assets, custName)
CustLoan(custName, loanNo, amount)
❖ Schema (Re)Design (cont) |
The BranchCust relation instance:
branchName | branchCity | assets | custName |
Downtown | Brooklyn | 9000000 | Jones |
Redwood | Palo Alto | 2100000 | Smith |
Perryridge | Horseneck | 1700000 | Hayes |
Downtown | Brooklyn | 9000000 | Jackson |
Minus | Horseneck | 400000 | Jones |
Round Hill | Horseneck | 8000000 | Turner |
North Town | Rye | 3700000 | Hayes |
❖ Schema (Re)Design (cont) |
The CustLoan relation instance:
custName | loanNo | amount |
Jones | L-17 | 1000 |
Smith | L-23 | 2000 |
Hayes | L-15 | 1500 |
Jackson | L-15 | 1500 |
Jones | L-93 | 500 |
Turner | L-11 | 900 |
Hayes | L-16 | 1300 |
❖ Schema (Re)Design (cont) |
Now consider the result of (BranchCust Join CustLoan)
branchName | branchCity | assets | custName | loanNo | amount |
Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
Downtown | Brooklyn | 9000000 | Jones | L-93 | 500 |
Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
Perryridge | Horseneck | 1700000 | Hayes | L-16 | 1300 |
Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
Minus | Horseneck | 400000 | Jones | L-93 | 500 |
Minus | Horseneck | 400000 | Jones | L-17 | 1000 |
Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
North Town | Rye | 3700000 | Hayes | L-15 | 1500 |
❖ Schema (Re)Design (cont) |
This is clearly not a successful decomposition.
The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.
Such a decomposition is called a lossy decomposition.
In a good decomposition, we should be able to reconstruct the original relation exactly:
Such a decomposition is called lossless join decomposition.
❖ Boyce-Codd Normal Form |
A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
Observations:
❖ Boyce-Codd Normal Form (cont) |
If we transform a schema into BCNF, we are guaranteed:
If we need to preserve dependencies, use 3NF.
A dependency X → Y is preserved if all of its attributes exist in one table.
Decomposition may e.g. place X and Y in separate tables
❖ BCNF Decomposition |
The following algorithm converts an arbitrary schema to BCNF:
Inputs: schema R, set F of fds Output: set Res of BCNF schemas (tables) Res = {R}; while (any schema S ∈ Res is not in BCNF) { choose any fd X → Y on S that violates BCNF Res = (Res-S) ∪ (S-Y) ∪ XY }
The last step means: make a table from XY; drop Y from table S; drop table S
The "choose any" step means that the algorithm is non-deterministic
❖ Exercise: BCNF Normalisation (1) |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the BCNF decompositon algorithm to produce a BCNF schema.
❖ Exercise: BCNF Normalisation (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a BCNF schema.
❖ Exercise: BCNF Normalisation (3) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
Produce a BCNF decomposition of R.
❖ Third Normal Form |
A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
The extra condition represents a slight weakening of BCNF requirements.
❖ Third Normal Form (cont) |
If we transform a schema into 3NF, we are guaranteed:
Whether to use BCNF or 3NF depends on overall design considerations.
❖ Third Normal Form (cont) |
The following algorithm converts an arbitrary schema to 3NF:
Inputs: schema R, set F of fds Output: set Ri of 3NF schemas let Fc be a minimal cover for F Res = {} for each fd X → Y in Fc { if (no schema S ∈ Res contains XY) { Res = Res ∪ XY } } if (no schema S ∈ Res contains a candidate key for R) { K = any candidate key for R Res = Res ∪ K }
❖ Third Normal Form (cont) |
Critical step is producing minimal cover Fc for F
A set F of fds is minimal if
❖ Exercise: BCNF Normalisation (1) |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the 3NF decompositon algorithm to produce a 3NF schema.
❖ Exercise: 3NF Normalisation (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a 3NF schema.
❖ Exercise: 3NF Normalisation (3) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = Fc = { BH → C, A → D, C → E, F → A, E → F, BGH → E }
Produce a 3NF decomposition of R.
❖ Database Design Methodology |
To achieve a "good" database design:
❖ Relational Algebra |
Relational algebra (RA) can be viewed as ...
❖ Relational Algebra (cont) |
Core relational algebra operations:
❖ Relational Algebra (cont) |
Select, project, join provide a powerful set of operations for building relations and extracting interesting data from them.
Adding set operations and renaming makes RA complete.
❖ Notation |
Standard treatments of relational algebra use Greek symbols.
We use the following notation (because it is easier to reproduce):
Standard Notation |
Our Notation |
|||
σexpr(Rel) | Sel[expr](Rel) | |||
πA,B,C(Rel) | Proj[A,B,C](Rel) | |||
Rel1 ⋈expr Rel2 | Rel1 Join[expr] Rel2 | |||
ρschemaRel | Rename[schema](Rel) |
For other operations (e.g. set operations) we adopt the standard notation.
Except when typing in a text file, where *
+
❖ Describing RA Operations |
We define the semantics of RA operations using
❖ Describing RA Operations (cont) |
All RA operators return a result of type relation.
For convenience, we can name a result and use it later.
E.g.
Temp = R op1 S op2 T
Res = Temp op3 Z
-- which is equivalent to
Res = (R op1 S op2 T) op3 Z
Each result is a relation with a well-defined schema
❖ Rename |
Rename provides "schema mapping".
If expression E returns a relation R(A1, A2, ... An), then
gives a relation called S
The only thing that Rename changes is the schema.
❖ Rename (cont) |
Rename can be viewed as a "technical" apparatus of RA.
Can use implicit rename/project in sequences of RA operations, e.g.
-- R(a,b,c), S(c,d) Res = Rename[Res(b,c,d)](Project[b,c](Sel[a>5](R)) Join S) -- vs Tmp1 = Select[a>5](R) Tmp2 = Project[b,c](Tmp1) Tmp3 = Rename[Tmp3(cc,d)](S) Tmp4 = Tmp2 Join[c=cc] Tmp3 Res = Rename[Res(b,c,d)](Tmp4) -- vs Tmp1(b,c) = Select[a>5](R) Tmp2(cc,d) = S Res(b,c,d) = Tmp1 Join[c=cc] Tmp2
In SQL, can achieve a similar effect by defining a set of views
❖ Exercise: Rename |
Answer the following in SQL then relational algebra
addr
suburb
❖ Projection |
Projection returns a set of tuples containing a subset of the attributes in the original relation.
X specifies a subset of the attributes of R.
Note that removing key attributes can produce duplicates.
In RA, duplicates are removed from the result set.
Result size: |πX(r)| ≤ |r| Result schema: R'(X)
Algorithmic view:
result = {} for each tuple t in relation r result = result ∪ {t[X]}
❖ Exercise: Projection |
Answer the following in SQL then relational algebra
❖ Selection |
Selection returns a subset of the tuples in a relation r(R) that satisfy a specified condition C.
C is a boolean expression on attributes in R.
Result size: |σC(r)| ≤ |r|
Result schema: same as the schema of r (i.e. R)
Algorithmic view:
result = {} for each tuple t in relation r if (C(t)) { result = result ∪ {t} }
❖ Exercise: Selection |
Answer the following in SQL and relational algebra
❖ Union |
Union combines two compatible relations into a single relation via set union of sets of tuples.
Compatibility = both relations have the same schema
Result size: |r1 ∪ r2| ≤ |r1| + |r2| Result schema: R
Algorithmic view:
result = r1 for each tuple t in relation r2 result = result ∪ {t}
❖ Intersection |
Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.
Uses same notion of relation compatibility as union.
Result size: |r1 ∪ r2| ≤ min(|r1|,|r2|) Result schema: R
Algorithmic view:
result = {} for each tuple t in relation r1 if (t ∈ r2) { result = result ∪ {t} }
❖ Exercise: Union/Intersection |
Answer the following in SQL then relational algebra
❖ Difference |
Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
Uses same notion of relation compatibility as union.
Note: tuples in r2 but not r1 do not appear in the result
result = {} for each tuple t in relation r1 if (!(t ∈ r2)) { result = result ∪ {t} }
❖ Difference (cont) |
Answer the following in SQL then relational algebra
Produced: 30 Oct 2023