COMP3311 Week 7 Wednesday Lecture

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [0/38]
❖ Week 07 Wednesday


In today's lecture ...

Things to do ...

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [1/38]
❖ Assignment 2 Database

Partial ER diagram of the MyMyUNSW database:

[Diagram:Pics/assignments/mymy-er.png]

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [2/38]
❖ Assignment 2 Database (cont)

SQL Schema:

create domain TermCode as char(4) check (value ~'[12][01239]T[0123]');
create domain SubjectCode as char(8) check (value ~ '[A-Z]{4}[0-9]{4}');
create domain StreamCode as char(6)
       check (value ~ '[A-Z]{4}[A-Z-][A-Z123-]');
create domain ProgramCode as char(4) check (value ~ '[0-9]{4}');
create domain ZidValue as integer
       check (value between 1000000 and 9999999);
create type ResidentType as enum ('AUS','AUSPR','INTL','NZ');
create type UnitType as enum ('UNSW','faculty','school','department');
create type CareerType as enum ('UGRD','PGRD','RSCH');
create type GradeType as enum
        ('A', 'A+', 'A-', 'AF', 'AS', 'AW', 'B', 'B+', 'B-', 'C', 'C+',
         'C-', 'CR', 'D', 'D+', 'D-', 'DN', 'E', 'E+', 'E-', 'EC',
         'EM', 'F', 'FL', 'HD', 'NA', 'NC', 'NF', 'PE', 'PS', 'PW',
         'RC', 'RD', 'RS', 'SY', 'UF', 'WD', 'WJ', 'XE');
create type ReqType as enum
        ('core','elective','free','gened','stream','uoc');

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [3/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Terms (                   e.g. Terms(
    id          integer,                      5239,
    code        TermCode not null,            '23T3',
    starting    date,                         '2023-09-11',
    ending      date,                         '2023-11-17',
    description text,                         'Term 3 2023'
    primary key (id)                        )
);

create table Countries (               e.g. Countries(
    id          integer,                      10,
    code        char(3) not null,             'ARG',
    name        text not null,                'Argentina'
    primary key (id)                        )
);

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [4/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table People (
    id          integer,
    zid         ZidValue not null,
    family_name text,
    given_names text,
    full_name   text,
    origin      integer references Countries(id),
    primary key (id)
);

create table Students (
    id          integer references People(id),
    status      ResidentType,
    primary key (id)
);

create table Staff (
    id          integer references People(id),
    primary key (id)
);

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [5/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Orgunits (
    id          integer,
    code        varchar(10) not null,
    name        text not null,
    utype       UnitType not null,
    parent      integer references Orgunits(id),
    primary key (id)
);

E.g.
Orgunits(112,'faculty','Faculty of Engineering','ENG',0)
Orgunits(52,'faculty','Faculty of Science','SCI',0)
Orgunits(71,'school','School of Chemistry','CHEM',52)
OrgUnits(217,'school','School of Physics','PHYS',52)

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [6/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Subjects (
    id          integer,
    code        SubjectCode not null,
    title       text not null,
    uoc         integer not null check (uoc > 0),
    career      CareerType,
    owner       integer not null references Orgunits(id),
    primary key (id)
);

create table Courses (
    id          integer,
    subject     integer not null references Subjects(id),
    term        integer not null references Terms(id),
    convenor    integer references Staff(id),
    satisfact   integer check (satisfact between 0 and 100),
    nresponses  integer,
    primary key (id)
);

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [7/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Streams (
    id          integer,
    code        StreamCode,
    name        text not null,
    primary key (id)
);

create table Programs (
    id          integer,
    code        ProgramCode,
    name        text not null,
    primary key (id)
);

E.g.
Subjects(1234,'COMP3311','Database Systems',6,'UGRD',89)
Courses(4567,1234,5239,1625(9300035),95,400)
Streams(4342,'SENGAH','Software Engineering')
Programs(5432,'3707','BE(Hons)')

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [8/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Program_enrolments (
    id          integer,
    student     integer references Students(id),
    term        integer references Terms(id),
    program     integer references Programs(id),
    primary key (id)
);

create table Stream_enrolments (
    part_of     integer references Program_enrolments(id),
    stream      integer references Streams(id),
    primary key (part_of,stream)
);

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [9/38]
❖ Assignment 2 Database (cont)


SQL Schema cont:

create table Course_enrolments (
    student     integer references Students(id),
    course      integer references Courses(id),
    mark        integer check (mark between 0 and 100),
    grade       GradeType,
    primary key (student,course)
);

E.g.
Course_enrolments(62534(5312345),4567,90,'HD')
Course_enrolments(15243(5265432),4567,50,'PS')

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [10/38]
❖ Assignment 2 Database (cont)

SQL Schema cont:

create table Requirements (
    id          integer,
    name        text not null,
    rtype       ReqType not null,
    min_req     integer check (min_req > 0),
    max_req     integer check (max_req > 0),
    acadobjs    text,
    for_stream  integer references Streams(id),
    for_program integer references Programs(id),
    constraint  ProgOrStream check (
                    for_stream is NULL and for_program is not null
                    or
                    for_stream is not null and for_program is null
                ),
    primary key (id)
);

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [11/38]
❖ Assignment 2 Database (cont)


Examples of requirements:

Requirements(id,name,type,min,max,acadObjects,stream/program)

Requirements(123,'Comp Sci core','core',42,42,
    'COMP1511,COMP1521,COMP1531,...,{COMP3121;COMP3821},...',
    1234(COMPA1),NULL)
Requirements(345,'Gen Ed','gened',12,12,'########',
    NULL,25252(3778))
Requirements(678,"Comp Sci electives','elective',30,NULL,
    'ENGG2600,ENG3600,ENGG4600,COMP2041,COMP3###,COMP4###,COMP6###,COMP9###',
    1234(COMPA1),NULL)
Requirements(789,'CS Majors','stream',1,1,
    'COMPA1,COMPD1,COMPI1,COMPJ1,COMPN1,COMPS1,COMPY1',
    NULL,25252(3778))
Requirements(789,'Level 1 Limit','uoc',-,60,'####1###',
	NULL,25252(3778))

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [12/38]
❖ Assignment 2 Scripts


Write Python/Psycopg2 scripts to ...

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [13/38]
❖ Assignment 2 Scripts (cont)

Sample output:

$ python3 q3.py COMPA1
COMPA1 Computer Science
- offered by School of Computer Science and Engineering
Academic Requirements:
all courses from Comp Sci core
- COMP1511 Programming Fundamentals
- COMP1521 Computer Systems Fundamentals
- COMP1531 Software Eng Fundamentals
- COMP2511 O-O Design & Programming
- COMP2521 Data Structures and Algorithms
- MATH1081 Discrete Mathematics
- MATH1131 Mathematics 1A
  or MATH1141 Higher Mathematics 1A
- MATH1231 Mathematics 1B
  or MATH1241 Higher Mathematics 1B
- COMP3121 Algorithms & Programming Tech
  or COMP3821 Ext Algorithms&Prog Techniques
- COMP3900 Computer Science Project
- COMP4920 Management and Ethics
at least 30 UOC courses from Comp Sci electives
- courses matching ENGG2600,ENG3600,ENGG4600,COMP3###,COMP4###,COMP6###,COMP9###
at least 36 UOC of Free Electives

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [14/38]
❖ Assignment 2 Scripts (cont)


Sample output (not from this term's database):

$ python q3.py 5124159
5124159 Magia, Tristan
COMP9318 17s1 Data Warehousing & Data Mining   65 CR   6uoc
COMP9319 17s1 Web Data Compression & Search    86 HD   6uoc
COMP9417 17s1 Machine Learning and Data M...   78 DN   6uoc
COMP6714 17s2 Information Retrieval and W...   69 CR   6uoc
COMP9313 17s2 Big Data Management              77 DN   6uoc
COMP9444 17s2 Neural Networks and Deep Le...   88 HD   6uoc
UOC = 36, WAM = 77.2


WAM = sum(uoci * marki) / sum(uoci) = 2778/36 = 77.2

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [15/38]
❖ Relational Design Theory (recap)


Schemas with redundancy are prone to update anomalies

ER → SQL mapping tends to give non-redundant schemas

Functional dependency gives insights into The methods we describe later
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [16/38]
❖ Functional Dependency

A relation instance r(R) satisfies a dependency X → Y if

In other words, if two tuples in R agree in their values for the set of attributes X, then they must also agree in their values for the set of attributes Y.

We say that "Y is functionally dependent on X".

Attribute sets X and Y may overlap; trivially true that X → X.

Notes:

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [17/38]
❖ Identifying Functional Dependencies


Consider an instance r(R) of the relation schema R(ABCDE):

A       B       C       D       E      
a1 b1 c1 d1 e1
a2 b1 c2 d2 e1
a3 b2 c1 d1 e1
a4 b2 c2 d2 e1
a5 b3 c3 d1 e1

What dependencies can we observe among its attributes?

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [18/38]
❖ Identifying Functional Dependencies (cont)


Since A values are unique, the definition of fd gives:

Since all E values are the same, it follows that: Need to be careful with combinations of attributes on LHS
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [19/38]
❖ Identifying Functional Dependencies (cont)


Other observations:

We could derive many other dependencies, e.g.   AE → BC, ...

In practice, choose a minimal set of fds (basis)

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [20/38]
❖ Exercise: Functional Dependencies (1)

Real estate agents conduct visits to rental properties

The company stores all of the associated data in a spreadsheet.
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [21/38]
❖ Exercise: Functional Dependencies (1) (cont)

Describe any functional dependencies that exist in this data:

P# | When        | Address    | Notes         | S#  | Name  | CarReg
----+-------------+------------+---------------+------+-------+-------
P4 | 03/06 15:15 | 55 High St | Bathroom leak | S44 | Rob   | ABK754
P1 | 04/06 11:10 | 47 High St | All ok        | S44 | Rob   | ABK754
P4 | 03/07 12:30 | 55 High St | All ok        | S43 | Dave  | ATS123
P1 | 05/07 15:00 | 47 High St | Broken window | S44 | Rob   | ABK754
P1 | 05/07 15:00 | 47 High St | Leaking tap   | S44 | Rob   | ABK754
P2 | 13/07 12:00 | 12 High St | All ok        | S42 | Peter | ATS123
P1 | 10/08 09:00 | 47 High St | Window fixed  | S42 | Peter | ATS123
P3 | 11/08 14:00 | 99 High St | All ok        | S41 | John  | AAA001
P4 | 13/08 10:00 | 55 High St | All ok        | S44 | Rob   | ABK754
P3 | 05/09 11:15 | 99 High St | Bathroom leak | S42 | Peter | ATS123

State assumptions used in determining the fds.

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [22/38]
❖ Exercise: Functional Dependencies (2)

Consider this data (possibly from a spreadsheet):

   id   |     name        |   subj   | term |      title      | mark | grade 
--------+-----------------+----------+------+-----------------+------+-------
5123485 | Lillian Seng... | COMP9020 | 17s1 | Foundations ... |      | AF
5123485 | Lillian Seng... | COMP9021 | 17s2 | Principles o... |   46 | FL
5123485 | Lillian Seng... | COMP9020 | 18s1 | Foundations ... |   44 | FL
5123485 | Lillian Seng... | COMP9021 | 18s2 | Principles o... |   46 | FL
5123504 | Yap Huawen      | MATH5960 | 17s1 | Bayesian Inf... |   56 | PS
5123504 | Yap Huawen      | COMP9318 | 17s1 | Data Warehou... |   76 | DN
5123504 | Yap Huawen      | COMP9444 | 17s2 | Neural Netwo... |   53 | PS
5123504 | Yap Huawen      | COMP9814 | 18s1 | Ext Artifici... |      | NF
5123504 | Yap Huawen      | COMP6714 | 18s2 | Info Retriev... |   86 | HD
5123523 | Cindy Nava G... | COMP9313 | 17s1 | Big Data Man... |   55 | PS
5123523 | Cindy Nava G... | COMP9444 | 17s2 | Neural Netwo... |   63 | PS
5123523 | Cindy Nava G... | COMP9417 | 18s1 | Machine Lear... |      | AW
5123523 | Cindy Nava G... | COMP9318 | 19T1 | Data Warehou... |   43 | FL
5123523 | Cindy Nava G... | COMP6714 | 19T3 | Info Retriev... |      | AW
5123523 | Cindy Nava G... | COMP9321 | 20T1 | Data Service... |      | SY

Identify functional dependencies

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [23/38]
❖ Exercise: Functional Dependencies (3)

What functional dependencies exist in the following table:

  A  |  B  |  C  |  D
-----+-----+-----+-----
  1  |  a  |  6  |  x
  2  |  b  |  7  |  y
  3  |  c  |  7  |  z
  4  |  d  |  6  |  x
  5  |  a  |  6  |  y
  6  |  b  |  7  |  z
  7  |  c  |  7  |  x
  8  |  d  |  6  |  y

How is this case different to the previous ones?

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [24/38]
❖ Functional Dependency

Above, we considered dependency within a relation instance r(R).

More important for design is dependency across all possible instances of the relation (i.e. a schema-based dependency).

This is a simple generalisation of the previous definition:

Such dependencies capture semantics of problem domain.

E.g. course enrolment example

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [25/38]
❖ Functional Dependency (cont)

Can we generalise some ideas about functional dependency?


E.g. are there dependencies that hold for any  relation?


E.g. do some dependencies suggest the existence of others?
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [26/38]
❖ Inference Rules

Armstrong's rules are general rules of inference on fds.

F1. Reflexivity   e.g.   X → X

F2. Augmentation   e.g.   X → Y  ⇒  XZ → YZ

F3. Transitivity   e.g.   X → Y, Y → Z  ⇒  X → Z

F4. Additivity   e.g.   X → Y, X → Z   ⇒   X → YZ

F5. Projectivity   e.g.   X → YZ   ⇒   X → Y, X → Z

F6. Pseudotransitivity   e.g.   X → Y, YZ → W   ⇒   XZ → W

F1 - F3 are essential; the rest are useful to have

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [27/38]
❖ Inference Rules (cont)

Example: determining validity of AB → GH, given:

R = ABCDEFGHIJ

F = { AB → E,   AG → J,   BE → I,   E → G,   GI → H }

Derivation:
1.   AB → E (given)
2.   E → G (given)
3.   AB → G (using F3 on 1,2)
4.   AB → AB (using F1)
5.   AB → B (using F5 on 4)
6.   AB → BE (using F4 on 1,5)
7.   BE → I (given)
8.   AB → I (using F3 on 6,7)
9.   AB → GI (using F4 on 3,8)
10. GI → H (given)
11. AB → H (using F3 on 9,10)
12. AB → GH (using F4 on 3,11)
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [28/38]
❖ Closures


Given a set F of fds, how many new fds can we derive?

For a finite set of attributes, there must be a finite set of derivable fds.

Closure = F+  = largest collection of dependencies derivable from F

Closures allow us to answer two interesting questions:

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [29/38]
❖ Closures (cont)

For the question "is X → Y derivable from F?" ...


For the question "are F and G equivalent?" ...
Unfortunately, closures can be very large, e.g.

R = ABC,     F = { AB → C,   C → B }
F+ = { A → A,   AB → A,   AC → A,   AB → B,   BC → B,   ABC → B,  
            C → C,   AC → C,   BC → C,   ABC → C,   AB → AB,   . . . . . . ,
            AB → ABC,   AB → ABC,   C → B,   C → BC,   AC → B,   AC → AB }

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [30/38]
❖ Closures (cont)

Algorithms based on F+ rapidly become infeasible.

To solve this problem ...

Given a set X of attributes and a set F of fds, the closure of X (denoted X+) is

We can prove (using additivity) that    (X → Y) ∈ F+   iff   Y ⊂ X+.

For computation purposes, |X+| is bounded by the number of attributes.

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [31/38]
❖ Closures (cont)

For the question "is X → Y derivable from F?" ...


For the question "are F and G equivalent?" ...
For the question "what are the keys of R implied by F?" ...
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [32/38]
❖ Exercise: Determining Keys

Determine possible primary keys for each of the following:

  1. FD  =  { A → B, C → D, E → FG }
  2. FD  =  { A → B, B → C, C → D }
  3. FD  =  { A → B, B → C, C → A }
  4. FD  =  { AB → C,   C → B }
  5. FD  =  { MN → P, NP → Q, Q → R }
  6. FD  =  { ABH → C, A → DE, BGH → F, F → ADH, BH → GE }
  7. FD  =  { ABH → C, A → D, C → E F → A, E → F, BGH → E }
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [33/38]
❖ Normalisation


Normalisation: branch of relational theory providing design insights.

Makes use of schema normal forms

And normalisation algorithms which

Normalisation draws heavily on the theory of functional dependencies.

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [34/38]
❖ Normal Forms

Normalisation theory defines six normal forms (NFs).

NF hierarachy:   5NF 4NF BCNF 3NF 2NF 1NF

1NF allows most redundancy;   5NF allows least redundancy.

We say that "a schema is in xNF", if all of its tables are xNF

For most practical purposes, BCNF (or 3NF) are acceptable NFs.

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [35/38]
❖ Normal Forms (cont)


Normal forms:

1NF all attributes have atomic values
we assume this as part of relational model,
so every relation schema is in 1NF
2NF all non-key attributes fully depend on key
(i.e. no partial dependencies)
avoids much redundancy
3NF
BCNF
no attributes dependent on non-key attrs
(i.e. no transitive dependencies)
avoids most remaining redundancy

COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [36/38]
❖ Normal Forms (cont)


Boyce-Codd Normal Form (BCNF):


Third Normal Form (3NF):
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [37/38]
❖ Normal Forms (cont)

Normalisation aims to puts a schema into xNF


How normalisation works ...
COMP3311 23T3 ♢ Week 7 Wednesday Lecture ♢ [38/38]


Produced: 25 Oct 2023