COMP3311 Week 7 Thursday Lecture

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [0/29]
❖ Week 07 Thursday


In today's lecture ...

Things to do ...

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [1/29]
❖ Relational Design Theory


We know how to express ER data models and SQL schemas

But how do we know that our models/schemas are "good"?

Properties of "good" models/schemas

Relational design theory addresses the last issue

Relies on the notion of functional dependency

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [2/29]
❖ Relational Design Theory (cont)


Functional dependencies

What we study here: The aim of studying this:
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [3/29]
❖ Relational Design and Redundancy


In database design, redundancy is generally a "bad thing":

Our goal is to reduce redundancy in stored data
But ... redundancy may have some advantages
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [4/29]
❖ Relational Design and Redundancy (cont)

Consider the following relation defining bank accounts/branches:

accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Horseneck 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Minus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000

Careless updating of this data may introduce inconsistencies.

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [5/29]
❖ Exercise: Update Anomalies


Show what happens when the following changes occur:

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [6/29]
❖ Update Anomalies

Insertion anomaly:

Update anomaly: Deletion anomaly:
Insertion/update anomalies are avoidable, but need extra DBMS work
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [7/29]
❖ Avoiding Update Anomalies

Redundancy in stored data can lead to update anomalies

Functional dependencies allow us to

Normalization methods allow us to
Note: most algorithms for normalization are non-deterministic
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [8/29]
❖ Notation/Terminology

Most texts adopt the following terminology:

Relation schemas upper-case letters, denoting set of all attributes (e.g. R, S, P, Q )
Relation instances lower-case letter corresponding to schema (e.g. r(R), s(S), p(P), q(Q) )
Tuples lower-case letters   (e.g. t, t', t1, u, v )
Attributes upper-case letters from start of alphabet (e.g. A, B, C, D )
Sets of attributes simple concatenation of attribute names (e.g. X=ABCD, Y=EFG )
Attributes in tuples tuple[attrSet] (e.g. t[ABCD], t[X])

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [9/29]
❖ Database Design (revisited)

To avoid update anomalies, normalization should give:

Normalization typically involves
Finally, each Ri  contains info about one entity (e.g. branch, customer, ...)
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [10/29]
❖ Database Design (revisited) (cont)

But we already have a design procedure (ER-then-relational)

Why do we need another design procedure?
  1. ER → Relational does not guarantee no redundancy
    • dependency theory allows us to check designs for residual problems
  2. the new procedure gives (semi)automated design
    • determine all of the attributes in the problem domain
    • collect them all together in a "universal relation"
    • provide information about how attributes are related
    • apply normalisation to decompose into non-redundant relations
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [11/29]
❖ Functional Dependency

A relation instance r(R) satisfies a dependency X → Y if

In other words, if two tuples in R agree in their values for the set of attributes X, then they must also agree in their values for the set of attributes Y.

We say that "Y is functionally dependent on X".

Attribute sets X and Y may overlap; trivially true that X → X.

Notes:

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [12/29]
❖ Identifying Functional Dependencies


Consider an instance r(R) of the relation schema R(ABCDE):

A       B       C       D       E      
a1 b1 c1 d1 e1
a2 b1 c2 d2 e1
a3 b2 c1 d1 e1
a4 b2 c2 d2 e1
a5 b3 c3 d1 e1

What dependencies can we observe among its attributes?

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [13/29]
❖ Identifying Functional Dependencies (cont)


Since A values are unique, the definition of fd gives:

Since all E values are the same, it follows that:
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [14/29]
❖ Identifying Functional Dependencies (cont)


Other observations:

We could derive many other dependencies, e.g.   AE → BC, ...

In practice, choose a minimal set of fds (basis)

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [15/29]
❖ Exercise: Functional Dependencies (1)

Real estate agents conduct visits to rental properties

The company stores all of the associated data in a spreadsheet.
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [16/29]
❖ Exercise: Functional Dependencies (1) (cont)

Describe any functional dependencies that exist in this data:


P# | When        | Address    | Notes         | S#  | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
P4 | 03/06 15:15 | 55 High St | Bathroom leak | S44 | Rob   | ABK754
P1 | 04/06 11:10 | 47 High St | All ok        | S44 | Rob   | ABK754
P4 | 03/07 12:30 | 55 High St | All ok        | S43 | Dave  | ATS123
P1 | 05/07 15:00 | 47 High St | Broken window | S44 | Rob   | ABK754
P1 | 05/07 15:00 | 47 High St | Leaking tap   | S44 | Rob   | ABK754
P2 | 13/07 12:00 | 12 High St | All ok        | S42 | Peter | ATS123
P1 | 10/08 09:00 | 47 High St | Window fixed  | S42 | Peter | ATS123
P3 | 11/08 14:00 | 99 High St | All ok        | S41 | John  | AAA001
P4 | 13/08 10:00 | 55 High St | All ok        | S44 | Rob   | ABK754
P3 | 05/09 11:15 | 99 High St | Bathroom leak | S42 | Peter | ATS123

State assumptions used in determining the fds.

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [17/29]
❖ Exercise: Functional Dependencies (2)

Consider this data (possibly from a spreadsheet):

   id   |     name        |   subj   | term |      title      | mark | grade 
--------+-----------------+----------+------+-----------------+------+-------
5123485 | Lillian Seng... | COMP9020 | 17s1 | Foundations ... |      | AF
5123485 | Lillian Seng... | COMP9021 | 17s2 | Principles o... |   46 | FL
5123485 | Lillian Seng... | COMP9020 | 18s1 | Foundations ... |   44 | FL
5123485 | Lillian Seng... | COMP9021 | 18s2 | Principles o... |   46 | FL
5123504 | Yap Huawen      | MATH5960 | 17s1 | Bayesian Inf... |   56 | PS
5123504 | Yap Huawen      | COMP9318 | 17s1 | Data Warehou... |   76 | DN
5123504 | Yap Huawen      | COMP9444 | 17s2 | Neural Netwo... |   53 | PS
5123504 | Yap Huawen      | COMP9814 | 18s1 | Ext Artifici... |      | NF
5123504 | Yap Huawen      | COMP6714 | 18s2 | Info Retriev... |   86 | HD
5123523 | Cindy Nava G... | COMP9313 | 17s1 | Big Data Man... |   55 | PS
5123523 | Cindy Nava G... | COMP9444 | 17s2 | Neural Netwo... |   63 | PS
5123523 | Cindy Nava G... | COMP9417 | 18s1 | Machine Lear... |      | AW
5123523 | Cindy Nava G... | COMP9318 | 19T1 | Data Warehou... |   43 | FL
5123523 | Cindy Nava G... | COMP6714 | 19T3 | Info Retriev... |      | AW
5123523 | Cindy Nava G... | COMP9321 | 20T1 | Data Service... |      | SY

Identify functional dependencies

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [18/29]
❖ Exercise: Functional Dependencies (2)

What functional dependencies exist in the following table:


  A  |  B  |  C  |  D
-----+-----+-----+-----
  1  |  a  |  6  |  x
  2  |  b  |  7  |  y
  3  |  c  |  7  |  z
  4  |  d  |  6  |  x
  5  |  a  |  6  |  y
  6  |  b  |  7  |  z
  7  |  c  |  7  |  x
  8  |  d  |  6  |  y

How is this case different to the previous ones?

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [19/29]
❖ Exercise: Functional Dependencies (3)

What functional dependencies exist in the following table:

  A  |  B  |  C  |  D
-----+-----+-----+-----
  1  |  a  |  6  |  x
  2  |  b  |  7  |  y
  3  |  c  |  7  |  z
  4  |  d  |  6  |  x
  5  |  a  |  6  |  y
  6  |  b  |  7  |  z
  7  |  c  |  7  |  x
  8  |  d  |  6  |  y

How is this case different to the previous ones?

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [20/29]
❖ Functional Dependency

Above, we considered dependency within a relation instance r(R).

More important for design is dependency across all possible instances of the relation (i.e. a schema-based dependency).

This is a simple generalisation of the previous definition:

Such dependencies capture semantics of problem domain.

E.g. course enrolment example

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [21/29]
❖ Functional Dependency (cont)

Can we generalise some ideas about functional dependency?


E.g. are there dependencies that hold for any  relation?


E.g. do some dependencies suggest the existence of others?
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [22/29]
❖ Inference Rules

Armstrong's rules are general rules of inference on fds.

F1. Reflexivity   e.g.   X → X

F2. Augmentation   e.g.   X → Y  ⇒  XZ → YZ

F3. Transitivity   e.g.   X → Y, Y → Z  ⇒  X → Z

F4. Additivity   e.g.   X → Y, X → Z   ⇒   X → YZ

F5. Projectivity   e.g.   X → YZ   ⇒   X → Y, X → Z

F6. Pseudotransitivity   e.g.   X → Y, YZ → W   ⇒   XZ → W

F1 - F3 are essential; the rest are useful to have

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [23/29]
❖ Inference Rules (cont)

Example: determining validity of AB → GH, given:

R = ABCDEFGHIJ

F = { AB → E,   AG → J,   BE → I,   E → G,   GI → H }

Derivation:
1.   AB → E (given)
2.   E → G (given)
3.   AB → G (using F3 on 1,2)
4.   AB → AB (using F1)
5.   AB → B (using F5 on 4)
6.   AB → BE (using F4 on 1,5)
7.   BE → I (given)
8.   AB → I (using F3 on 6,7)
9.   AB → GI (using F4 on 3,8)
10. GI → H (given)
11. AB → H (using F3 on 9,10)
12. AB → GH (using F4 on 3,11)
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [24/29]
❖ Closures


Given a set F of fds, how many new fds can we derive?

For a finite set of attributes, there must be a finite set of derivable fds.

Closure = F+  = largest collection of dependencies derivable from F

Closures allow us to answer two interesting questions:

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [25/29]
❖ Closures (cont)

For the question "is X → Y derivable from F?" ...


For the question "are F and G equivalent?" ...
Unfortunately, closures can be very large, e.g.

R = ABC,     F = { AB → C,   C → B }
F+ = { A → A,   AB → A,   AC → A,   AB → B,   BC → B,   ABC → B,  
            C → C,   AC → C,   BC → C,   ABC → C,   AB → AB,   . . . . . . ,
            AB → ABC,   AB → ABC,   C → B,   C → BC,   AC → B,   AC → AB }

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [26/29]
❖ Closures (cont)

Algorithms based on F+ rapidly become infeasible.

To solve this problem ...

Given a set X of attributes and a set F of fds, the closure of X (denoted X+) is

We can prove (using additivity) that    (X → Y) ∈ F+   iff   Y ⊂ X+.

For computation purposes, |X+| is bounded by the number of attributes.

COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [27/29]
❖ Closures (cont)

For the question "is X → Y derivable from F?" ...


For the question "are F and G equivalent?" ...
For the question "what are the keys of R implied by F?" ...
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [28/29]
❖ Exercise: Determining Keys

Determine possible primary keys for each of the following:

  1. FD  =  { A → B, C → D, E → FG }
  2. FD  =  { A → B, B → C, C → D }
  3. FD  =  { A → B, B → C, C → A }
  4. FD  =  { AB → C,   C → B }
  5. FD  =  { MN → P, NP → Q, Q → R }
  6. FD  =  { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
  7. FD  =  { ABH → C, A → D, C → E F → A, E → F, BGH → E }
COMP3311 22T3 ♢ Week 7 Thursday Lecture ♢ [29/29]


Produced: 30 Mar 2023