❖ Week 08 Monday |
❖ Assignment 2 Database |
Things to note:
NOT NULL❖ Assignment 2 Scripts |
Write Python/Psycopg2 scripts to ...
q1.pyq2.pyq3.pyq4.pyq5.pyq6.py
python3 q5.py zID vs python3 q5.py zID Prog Stream
Since output is text, need to be extra careful with output format
❖ Normalisation |
Normalisation: branch of relational theory providing design insights.
Makes use of schema normal forms
Normalisation draws heavily on the theory of functional dependencies.
❖ Relation Decomposition |
The standard transformation technique to remove redundancy:
We accomplish decomposition by
May require several decompositions to achieve acceptable NF.
Normalisation algorithms tell us how to choose S and T.
❖ Schema (Re)Design |
Consider the following relation for BankLoans:
| branchName | branchCity | assets | custName | loanNo | amount |
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
| Minus | Horseneck | 400000 | Jones | L-93 | 500 |
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
This schema has all of the update anomalies mentioned earlier.
❖ Schema (Re)Design (cont) |
To improve the design, decompose the BankLoans relation.
The following decomposition is not helpful:
Branch(branchName, branchCity, assets)
CustLoan(custName, loanNo, amount)
because we lose information (which branch is a loan held at?)
Another possible decomposition:
BranchCust(branchName, branchCity, assets, custName)
CustLoan(custName, loanNo, amount)
❖ Schema (Re)Design (cont) |
The BranchCust relation instance:
| branchName | branchCity | assets | custName |
| Downtown | Brooklyn | 9000000 | Jones |
| Redwood | Palo Alto | 2100000 | Smith |
| Perryridge | Horseneck | 1700000 | Hayes |
| Downtown | Brooklyn | 9000000 | Jackson |
| Minus | Horseneck | 400000 | Jones |
| Round Hill | Horseneck | 8000000 | Turner |
| North Town | Rye | 3700000 | Hayes |
❖ Schema (Re)Design (cont) |
The CustLoan relation instance:
| custName | loanNo | amount |
| Jones | L-17 | 1000 |
| Smith | L-23 | 2000 |
| Hayes | L-15 | 1500 |
| Jackson | L-15 | 1500 |
| Jones | L-93 | 500 |
| Turner | L-11 | 900 |
| Hayes | L-16 | 1300 |
❖ Schema (Re)Design (cont) |
Now consider the result of (BranchCust Join CustLoan)
| branchName | branchCity | assets | custName | loanNo | amount |
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 |
| Downtown | Brooklyn | 9000000 | Jones | L-93 | 500 |
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 |
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 |
| Perryridge | Horseneck | 1700000 | Hayes | L-16 | 1300 |
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 |
| Minus | Horseneck | 400000 | Jones | L-93 | 500 |
| Minus | Horseneck | 400000 | Jones | L-17 | 1000 |
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 |
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 |
| North Town | Rye | 3700000 | Hayes | L-15 | 1500 |
❖ Schema (Re)Design (cont) |
This is clearly not a successful decomposition.
The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.
Such a decomposition is called a lossy decomposition.
In a good decomposition, we should be able to reconstruct the original relation exactly:
Such a decomposition is called lossless join decomposition.
❖ Boyce-Codd Normal Form |
A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
Observations:
❖ Boyce-Codd Normal Form (cont) |
If we transform a schema into BCNF, we are guaranteed:
If we need to preserve dependencies, use 3NF.
A dependency X → Y is preserved if all of its attributes exist in one table.
Decomposition may e.g. place X and Y in separate tables
❖ BCNF Decomposition |
The following algorithm converts an arbitrary schema to BCNF:
Inputs: schema R, set F of fds
Output: set Res of BCNF schemas (tables)
Res = {R};
while (any schema S ∈ Res is not in BCNF) {
choose any fd X → Y on S that violates BCNF
Res = (Res-S) ∪ (S-Y) ∪ XY
}
The last step means: make a table from XY; drop Y from table S; drop table S
The "choose any" step means that the algorithm is non-deterministic
❖ Exercise: BCNF Normalisation (1) |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the BCNF decompositon algorithm to produce a BCNF schema.
❖ Exercise: BCNF Normalisation (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a BCNF schema.
❖ Exercise: BCNF Normalisation (3) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
Produce a BCNF decomposition of R.
❖ Third Normal Form |
A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
The extra condition represents a slight weakening of BCNF requirements.
❖ Third Normal Form (cont) |
If we transform a schema into 3NF, we are guaranteed:
Whether to use BCNF or 3NF depends on overall design considerations.
❖ Third Normal Form (cont) |
The following algorithm converts an arbitrary schema to 3NF:
Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas
let Fc be a minimal cover for F
Res = {}
for each fd X → Y in Fc {
if (no schema S ∈ Res contains XY) {
Res = Res ∪ XY
}
}
if (no schema S ∈ Res contains a candidate key for R) {
K = any candidate key for R
Res = Res ∪ K
}
❖ Third Normal Form (cont) |
Critical step is producing minimal cover Fc for F
A set F of fds is minimal if
❖ Exercise: BCNF Normalisation (1) |
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the 3NF decompositon algorithm to produce a 3NF schema.
❖ Exercise: 3NF Normalisation (2) |
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a 3NF schema.
❖ Exercise: 3NF Normalisation (3) |
Consider the schema R and set of fds F
R = ABCDEFGH
F = Fc = { BH → C, A → D, C → E, F → A, E → F, BGH → E }
Produce a 3NF decomposition of R.
❖ Database Design Methodology |
To achieve a "good" database design:
❖ Relational Algebra |
Relational algebra (RA) can be viewed as ...
❖ Relational Algebra (cont) |
Core relational algebra operations:
❖ Relational Algebra (cont) |
Select, project, join provide a powerful set of operations for building relations and extracting interesting data from them.
Adding set operations and renaming makes RA complete.
❖ Notation |
Standard treatments of relational algebra use Greek symbols.
We use the following notation (because it is easier to reproduce):
| Standard Notation |
Our Notation |
|||
| σexpr(Rel) | Sel[expr](Rel) | |||
| πA,B,C(Rel) | Proj[A,B,C](Rel) | |||
| Rel1 ⋈expr Rel2 | Rel1 Join[expr] Rel2 | |||
| ρschemaRel | Rename[schema](Rel) |
For other operations (e.g. set operations) we adopt the standard notation.
Except when typing in a text file, where *+
❖ Describing RA Operations |
We define the semantics of RA operations using
❖ Describing RA Operations (cont) |
All RA operators return a result of type relation.
For convenience, we can name a result and use it later.
E.g.
Temp = R op1 S op2 T
Res = Temp op3 Z
-- which is equivalent to
Res = (R op1 S op2 T) op3 Z
Each result is a relation with a well-defined schema
❖ Rename |
Rename provides "schema mapping".
If expression E returns a relation R(A1, A2, ... An), then
gives a relation called S
The only thing that Rename changes is the schema.
❖ Rename (cont) |
Rename can be viewed as a "technical" apparatus of RA.
Can use implicit rename/project in sequences of RA operations, e.g.
-- R(a,b,c), S(c,d) Res = Rename[Res(b,c,d)](Project[b,c](Sel[a>5](R)) Join S) -- vs Tmp1 = Select[a>5](R) Tmp2 = Project[b,c](Tmp1) Tmp3 = Rename[Tmp3(cc,d)](S) Tmp4 = Tmp2 Join[c=cc] Tmp3 Res = Rename[Res(b,c,d)](Tmp4) -- vs Tmp1(b,c) = Select[a>5](R) Tmp2(cc,d) = S Res(b,c,d) = Tmp1 Join[c=cc] Tmp2
In SQL, can achieve a similar effect by defining a set of views
❖ Exercise: Rename |
Answer the following in SQL then relational algebra
addrsuburb
❖ Projection |
Projection returns a set of tuples containing a subset of the attributes in the original relation.
X specifies a subset of the attributes of R.
Note that removing key attributes can produce duplicates.
In RA, duplicates are removed from the result set.
Result size: |πX(r)| ≤ |r| Result schema: R'(X)
Algorithmic view:
result = {}
for each tuple t in relation r
result = result ∪ {t[X]}
❖ Exercise: Projection |
Answer the following in SQL then relational algebra
❖ Selection |
Selection returns a subset of the tuples in a relation r(R) that satisfy a specified condition C.
C is a boolean expression on attributes in R.
Result size: |σC(r)| ≤ |r|
Result schema: same as the schema of r (i.e. R)
Algorithmic view:
result = {}
for each tuple t in relation r
if (C(t)) { result = result ∪ {t} }
❖ Exercise: Selection |
Answer the following in SQL and relational algebra
❖ Union |
Union combines two compatible relations into a single relation via set union of sets of tuples.
Compatibility = both relations have the same schema
Result size: |r1 ∪ r2| ≤ |r1| + |r2| Result schema: R
Algorithmic view:
result = r1
for each tuple t in relation r2
result = result ∪ {t}
❖ Intersection |
Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.
Uses same notion of relation compatibility as union.
Result size: |r1 ∪ r2| ≤ min(|r1|,|r2|) Result schema: R
Algorithmic view:
result = {}
for each tuple t in relation r1
if (t ∈ r2) { result = result ∪ {t} }
❖ Exercise: Union/Intersection |
Answer the following in SQL then relational algebra
❖ Difference |
Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
Uses same notion of relation compatibility as union.
Note: tuples in r2 but not r1 do not appear in the result
result = {}
for each tuple t in relation r1
if (!(t ∈ r2)) { result = result ∪ {t} }
❖ Difference (cont) |
Answer the following in SQL then relational algebra
Produced: 30 Oct 2023