❖ Week 08 Monday  | 
❖ Assignment 2 Database  | 
Things to note:
NOT NULL❖ Assignment 2 Scripts  | 
Write Python/Psycopg2 scripts to ...
q1.pyq2.pyq3.pyq4.pyq5.pyq6.py
python3 q5.py zID   vs   python3 q5.py zID Prog Stream
Since output is text, need to be extra careful with output format
❖ Normalisation  | 
Normalisation: branch of relational theory providing design insights.
Makes use of schema normal forms
Normalisation draws heavily on the theory of functional dependencies.
❖ Relation Decomposition  | 
The standard transformation technique to remove redundancy:
We accomplish decomposition by
May require several decompositions to achieve acceptable NF.
Normalisation algorithms tell us how to choose S and T.
❖ Schema (Re)Design  | 
Consider the following relation for BankLoans:
| branchName | branchCity | assets | custName | loanNo | amount | 
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 | 
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 | 
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 | 
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 | 
| Minus | Horseneck | 400000 | Jones | L-93 | 500 | 
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 | 
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 | 
This schema has all of the update anomalies mentioned earlier.
❖ Schema (Re)Design (cont)  | 
To improve the design, decompose the BankLoans relation.
The following decomposition is not helpful:
    Branch(branchName, branchCity, assets) 
    CustLoan(custName, loanNo, amount)
because we lose information (which branch is a loan held at?)
Another possible decomposition:
    BranchCust(branchName, branchCity, assets, custName) 
    CustLoan(custName, loanNo, amount)
❖ Schema (Re)Design (cont)  | 
The BranchCust relation instance:
| branchName | branchCity | assets | custName | 
| Downtown | Brooklyn | 9000000 | Jones | 
| Redwood | Palo Alto | 2100000 | Smith | 
| Perryridge | Horseneck | 1700000 | Hayes | 
| Downtown | Brooklyn | 9000000 | Jackson | 
| Minus | Horseneck | 400000 | Jones | 
| Round Hill | Horseneck | 8000000 | Turner | 
| North Town | Rye | 3700000 | Hayes | 
❖ Schema (Re)Design (cont)  | 
The CustLoan relation instance:
| custName | loanNo | amount | 
| Jones | L-17 | 1000 | 
| Smith | L-23 | 2000 | 
| Hayes | L-15 | 1500 | 
| Jackson | L-15 | 1500 | 
| Jones | L-93 | 500 | 
| Turner | L-11 | 900 | 
| Hayes | L-16 | 1300 | 
❖ Schema (Re)Design (cont)  | 
Now consider the result of (BranchCust Join CustLoan)
| branchName | branchCity | assets | custName | loanNo | amount | 
| Downtown | Brooklyn | 9000000 | Jones | L-17 | 1000 | 
| Downtown | Brooklyn | 9000000 | Jones | L-93 | 500 | 
| Redwood | Palo Alto | 2100000 | Smith | L-23 | 2000 | 
| Perryridge | Horseneck | 1700000 | Hayes | L-15 | 1500 | 
| Perryridge | Horseneck | 1700000 | Hayes | L-16 | 1300 | 
| Downtown | Brooklyn | 9000000 | Jackson | L-15 | 1500 | 
| Minus | Horseneck | 400000 | Jones | L-93 | 500 | 
| Minus | Horseneck | 400000 | Jones | L-17 | 1000 | 
| Round Hill | Horseneck | 8000000 | Turner | L-11 | 900 | 
| North Town | Rye | 3700000 | Hayes | L-16 | 1300 | 
| North Town | Rye | 3700000 | Hayes | L-15 | 1500 | 
❖ Schema (Re)Design (cont)  | 
This is clearly not a successful decomposition.
The fact that we ended up with extra tuples is symptomatic of losing some critical "connection" information during the decomposition.
Such a decomposition is called a lossy decomposition.
In a good decomposition, we should be able to reconstruct the original relation exactly:
Such a decomposition is called lossless join decomposition.
❖ Boyce-Codd Normal Form  | 
A relation schema R is in BCNF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
Observations:
❖ Boyce-Codd Normal Form (cont)  | 
If we transform a schema into BCNF, we are guaranteed:
If we need to preserve dependencies, use 3NF.
A dependency X → Y is preserved if all of its attributes exist in one table.
Decomposition may e.g. place X and Y in separate tables
❖ BCNF Decomposition  | 
The following algorithm converts an arbitrary schema to BCNF:
Inputs: schema R, set F of fds
Output: set Res of BCNF schemas (tables)
Res = {R};
while (any schema S ∈ Res is not in BCNF) {
    choose any fd X → Y on S that violates BCNF
    Res = (Res-S) ∪ (S-Y) ∪ XY
}
The last step means: make a table from XY; drop Y from table S; drop table S
The "choose any" step means that the algorithm is non-deterministic
❖ Exercise: BCNF Normalisation (1)  | 
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the BCNF decompositon algorithm to produce a BCNF schema.
❖ Exercise:  BCNF Normalisation (2)  | 
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a BCNF schema.
❖ Exercise: BCNF Normalisation (3)  | 
Consider the schema R and set of fds F
R = ABCDEFGH
F = { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
Produce a BCNF decomposition of R.
❖ Third Normal Form  | 
A relation schema R is in 3NF w.r.t a set F of functional dependencies iff:
for all fds X → Y in F+
The extra condition represents a slight weakening of BCNF requirements.
❖ Third Normal Form (cont)  | 
If we transform a schema into 3NF, we are guaranteed:
Whether to use BCNF or 3NF depends on overall design considerations.
❖ Third Normal Form (cont)  | 
The following algorithm converts an arbitrary schema to 3NF:
Inputs: schema R, set F of fds
Output: set Ri of 3NF schemas
let Fc be a minimal cover for F
Res = {}
for each fd X → Y in Fc {
    if (no schema S ∈ Res contains XY) {
        Res = Res ∪ XY
    }
}
if (no schema S ∈ Res contains a candidate key for R) {
    K = any candidate key for R
    Res = Res ∪ K
}
❖ Third Normal Form (cont)  | 
Critical step is producing minimal cover Fc for F
A set F of fds is minimal if
❖ Exercise: BCNF Normalisation (1)  | 
Recall the BankLoans schema:
BankLoans(branchName, branchCity, assets, custName, loanNo, amount)
Has functional dependencies F
Use the 3NF decompositon algorithm to produce a 3NF schema.
❖ Exercise:  3NF Normalisation (2)  | 
Recall the following table (based on e.g. a spreadsheet):
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- PG4 | 03/06 15:15 | 55 High St | Bathroom leak | SG44 | Rob | ABK754 PG1 | 04/06 11:10 | 47 High St | All ok | SG44 | Rob | ABK754 PG4 | 03/07 12:30 | 55 High St | All ok | SG43 | Dave | ATS123 PG1 | 05/07 15:00 | 47 High St | Broken window | SG44 | Rob | ABK754 PG1 | 05/07 15:00 | 47 High St | Leaking tap | SG44 | Rob | ABK754 PG2 | 13/07 12:00 | 12 High St | All ok | SG42 | Peter | ATS123 ...
Recall the functional dependencies identified previously,
Use these to convert the table to a 3NF schema.
❖ Exercise:  3NF Normalisation (3)  | 
Consider the schema R and set of fds F
R = ABCDEFGH
F = Fc = { BH → C, A → D, C → E, F → A, E → F, BGH → E }
Produce a 3NF decomposition of R.
❖ Database Design Methodology  | 
To achieve a "good" database design:
❖ Relational Algebra  | 
Relational algebra (RA) can be viewed as ...
❖ Relational Algebra (cont)  | 
Core relational algebra operations:
❖ Relational Algebra (cont)  | 
Select, project, join provide a powerful set of operations for building relations and extracting interesting data from them.
Adding set operations and renaming makes RA complete.
❖ Notation  | 
Standard treatments of relational algebra use Greek symbols.
We use the following notation (because it is easier to reproduce):
| Standard Notation  | 
  Our Notation  | 
|||
| σexpr(Rel) | Sel[expr](Rel) | |||
| πA,B,C(Rel) | Proj[A,B,C](Rel) | |||
| Rel1 ⋈expr Rel2 | Rel1 Join[expr] Rel2 | |||
| ρschemaRel | Rename[schema](Rel) | 
For other operations (e.g. set operations) we adopt the standard notation. 
Except when typing in a text file, where *+
❖ Describing RA Operations  | 
We define the semantics of RA operations using
❖ Describing RA Operations (cont)  | 
All RA operators return a result of type relation.
For convenience, we can name a result and use it later.
E.g.
Temp = R op1 S op2 T
Res  = Temp op3 Z
-- which is equivalent to
Res  = (R op1 S op2 T) op3 Z
Each result is a relation with a well-defined schema
❖ Rename  | 
Rename provides "schema mapping".
If expression E returns a relation R(A1, A2, ... An), then
gives a relation called S
The only thing that Rename changes is the schema.
❖ Rename (cont)  | 
Rename can be viewed as a "technical" apparatus of RA.
Can use implicit rename/project in sequences of RA operations, e.g.
-- R(a,b,c), S(c,d) Res = Rename[Res(b,c,d)](Project[b,c](Sel[a>5](R)) Join S) -- vs Tmp1 = Select[a>5](R) Tmp2 = Project[b,c](Tmp1) Tmp3 = Rename[Tmp3(cc,d)](S) Tmp4 = Tmp2 Join[c=cc] Tmp3 Res = Rename[Res(b,c,d)](Tmp4) -- vs Tmp1(b,c) = Select[a>5](R) Tmp2(cc,d) = S Res(b,c,d) = Tmp1 Join[c=cc] Tmp2
In SQL, can achieve a similar effect by defining a set of views
❖ Exercise: Rename  | 
Answer the following in SQL then relational algebra
addrsuburb
❖ Projection  | 
Projection returns a set of tuples containing a subset of the attributes in the original relation.
X specifies a subset of the attributes of R.
Note that removing key attributes can produce duplicates.
In RA, duplicates are removed from the result set.
Result size: |πX(r)| ≤ |r| Result schema: R'(X)
Algorithmic view:
result = {}
for each tuple t in relation r
    result = result ∪ {t[X]}
❖ Exercise: Projection  | 
Answer the following in SQL then relational algebra
❖ Selection  | 
Selection returns a subset of the tuples in a relation r(R) that satisfy a specified condition C.
C is a boolean expression on attributes in R.
Result size: |σC(r)| ≤ |r|
Result schema: same as the schema of r (i.e. R)
Algorithmic view:
result = {}
for each tuple t in relation r
    if (C(t)) { result = result ∪ {t} }
❖ Exercise: Selection  | 
Answer the following in SQL and relational algebra
❖ Union  | 
Union combines two compatible relations into a single relation via set union of sets of tuples.
Compatibility = both relations have the same schema
Result size: |r1 ∪ r2| ≤ |r1| + |r2| Result schema: R
Algorithmic view:
result = r1
for each tuple t in relation r2
    result = result ∪ {t}
❖ Intersection  | 
Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.
Uses same notion of relation compatibility as union.
Result size: |r1 ∪ r2| ≤ min(|r1|,|r2|) Result schema: R
Algorithmic view:
result = {}
for each tuple t in relation r1
    if (t ∈ r2) { result = result ∪ {t} }
❖ Exercise: Union/Intersection  | 
Answer the following in SQL then relational algebra
❖ Difference  | 
Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
Uses same notion of relation compatibility as union.
Note: tuples in r2 but not r1 do not appear in the result
result = {}
for each tuple t in relation r1
    if (!(t ∈ r2)) { result = result ∪ {t} }
❖ Difference (cont)  | 
Answer the following in SQL then relational algebra
Produced: 30 Oct 2023