❖ Week 07 Wednesday |
❖ Assignment 2 Database (cont) |
SQL Schema:
create domain TermCode as char(4) check (value ~'[12][01239]T[0123]'); create domain SubjectCode as char(8) check (value ~ '[A-Z]{4}[0-9]{4}'); create domain StreamCode as char(6) check (value ~ '[A-Z]{4}[A-Z-][A-Z123-]'); create domain ProgramCode as char(4) check (value ~ '[0-9]{4}'); create domain ZidValue as integer check (value between 1000000 and 9999999); create type ResidentType as enum ('AUS','AUSPR','INTL','NZ'); create type UnitType as enum ('UNSW','faculty','school','department'); create type CareerType as enum ('UGRD','PGRD','RSCH'); create type GradeType as enum ('A', 'A+', 'A-', 'AF', 'AS', 'AW', 'B', 'B+', 'B-', 'C', 'C+', 'C-', 'CR', 'D', 'D+', 'D-', 'DN', 'E', 'E+', 'E-', 'EC', 'EM', 'F', 'FL', 'HD', 'NA', 'NC', 'NF', 'PE', 'PS', 'PW', 'RC', 'RD', 'RS', 'SY', 'UF', 'WD', 'WJ', 'XE'); create type ReqType as enum ('core','elective','free','gened','stream','uoc');
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Terms ( e.g. Terms( id integer, 5239, code TermCode not null, '23T3', starting date, '2023-09-11', ending date, '2023-11-17', description text, 'Term 3 2023' primary key (id) ) ); create table Countries ( e.g. Countries( id integer, 10, code char(3) not null, 'ARG', name text not null, 'Argentina' primary key (id) ) );
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table People ( id integer, zid ZidValue not null, family_name text, given_names text, full_name text, origin integer references Countries(id), primary key (id) ); create table Students ( id integer references People(id), status ResidentType, primary key (id) ); create table Staff ( id integer references People(id), primary key (id) );
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Orgunits ( id integer, code varchar(10) not null, name text not null, utype UnitType not null, parent integer references Orgunits(id), primary key (id) ); E.g. Orgunits(112,'faculty','Faculty of Engineering','ENG',0) Orgunits(52,'faculty','Faculty of Science','SCI',0) Orgunits(71,'school','School of Chemistry','CHEM',52) OrgUnits(217,'school','School of Physics','PHYS',52)
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Subjects ( id integer, code SubjectCode not null, title text not null, uoc integer not null check (uoc > 0), career CareerType, owner integer not null references Orgunits(id), primary key (id) ); create table Courses ( id integer, subject integer not null references Subjects(id), term integer not null references Terms(id), convenor integer references Staff(id), satisfact integer check (satisfact between 0 and 100), nresponses integer, primary key (id) );
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Streams ( id integer, code StreamCode, name text not null, primary key (id) ); create table Programs ( id integer, code ProgramCode, name text not null, primary key (id) ); E.g. Subjects(1234,'COMP3311','Database Systems',6,'UGRD',89) Courses(4567,1234,5239,1625(9300035),95,400) Streams(4342,'SENGAH','Software Engineering') Programs(5432,'3707','BE(Hons)')
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Program_enrolments ( id integer, student integer references Students(id), term integer references Terms(id), program integer references Programs(id), primary key (id) ); create table Stream_enrolments ( part_of integer references Program_enrolments(id), stream integer references Streams(id), primary key (part_of,stream) );
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Course_enrolments ( student integer references Students(id), course integer references Courses(id), mark integer check (mark between 0 and 100), grade GradeType, primary key (student,course) ); E.g. Course_enrolments(62534(5312345),4567,90,'HD') Course_enrolments(15243(5265432),4567,50,'PS')
❖ Assignment 2 Database (cont) |
SQL Schema cont:
create table Requirements ( id integer, name text not null, rtype ReqType not null, min_req integer check (min_req > 0), max_req integer check (max_req > 0), acadobjs text, for_stream integer references Streams(id), for_program integer references Programs(id), constraint ProgOrStream check ( for_stream is NULL and for_program is not null or for_stream is not null and for_program is null ), primary key (id) );
❖ Assignment 2 Database (cont) |
Examples of requirements:
Requirements(id,name,type,min,max,acadObjects,stream/program) Requirements(123,'Comp Sci core','core',42,42, 'COMP1511,COMP1521,COMP1531,...,{COMP3121;COMP3821},...', 1234(COMPA1),NULL) Requirements(345,'Gen Ed','gened',12,12,'########', NULL,25252(3778)) Requirements(678,"Comp Sci electives','elective',30,NULL, 'ENGG2600,ENG3600,ENGG4600,COMP2041,COMP3###,COMP4###,COMP6###,COMP9###', 1234(COMPA1),NULL) Requirements(789,'CS Majors','stream',1,1, 'COMPA1,COMPD1,COMPI1,COMPJ1,COMPN1,COMPS1,COMPY1', NULL,25252(3778)) Requirements(789,'Level 1 Limit','uoc',-,60,'####1###', NULL,25252(3778))
❖ Assignment 2 Scripts |
Write Python/Psycopg2 scripts to ...
q1.py
q2.py
q3.py
q4.py
q5.py
q6.py
❖ Assignment 2 Scripts (cont) |
Sample output:
$ python3 q3.py COMPA1 COMPA1 Computer Science - offered by School of Computer Science and Engineering Academic Requirements: all courses from Comp Sci core - COMP1511 Programming Fundamentals - COMP1521 Computer Systems Fundamentals - COMP1531 Software Eng Fundamentals - COMP2511 O-O Design & Programming - COMP2521 Data Structures and Algorithms - MATH1081 Discrete Mathematics - MATH1131 Mathematics 1A or MATH1141 Higher Mathematics 1A - MATH1231 Mathematics 1B or MATH1241 Higher Mathematics 1B - COMP3121 Algorithms & Programming Tech or COMP3821 Ext Algorithms&Prog Techniques - COMP3900 Computer Science Project - COMP4920 Management and Ethics at least 30 UOC courses from Comp Sci electives - courses matching ENGG2600,ENG3600,ENGG4600,COMP3###,COMP4###,COMP6###,COMP9### at least 36 UOC of Free Electives
❖ Assignment 2 Scripts (cont) |
Sample output (not from this term's database):
$ python q3.py 5124159 5124159 Magia, Tristan COMP9318 17s1 Data Warehousing & Data Mining 65 CR 6uoc COMP9319 17s1 Web Data Compression & Search 86 HD 6uoc COMP9417 17s1 Machine Learning and Data M... 78 DN 6uoc COMP6714 17s2 Information Retrieval and W... 69 CR 6uoc COMP9313 17s2 Big Data Management 77 DN 6uoc COMP9444 17s2 Neural Networks and Deep Le... 88 HD 6uoc UOC = 36, WAM = 77.2
WAM = sum(uoci * marki) / sum(uoci) = 2778/36 = 77.2
❖ Relational Design Theory (recap) |
Schemas with redundancy are prone to update anomalies
ER → SQL mapping tends to give non-redundant schemas
❖ Functional Dependency |
A relation instance r(R) satisfies a dependency X → Y if
We say that "Y is functionally dependent on X".
Attribute sets X and Y may overlap; trivially true that X → X.
Notes:
❖ Identifying Functional Dependencies |
Consider an instance r(R) of the relation schema R(ABCDE):
A | B | C | D | E |
a1 | b1 | c1 | d1 | e1 |
a2 | b1 | c2 | d2 | e1 |
a3 | b2 | c1 | d1 | e1 |
a4 | b2 | c2 | d2 | e1 |
a5 | b3 | c3 | d1 | e1 |
What dependencies can we observe among its attributes?
❖ Identifying Functional Dependencies (cont) |
Since A values are unique, the definition of fd gives:
❖ Identifying Functional Dependencies (cont) |
Other observations:
In practice, choose a minimal set of fds (basis)
❖ Exercise: Functional Dependencies (1) |
Real estate agents conduct visits to rental properties
❖ Exercise: Functional Dependencies (1) (cont) |
Describe any functional dependencies that exist in this data:
P# | When | Address | Notes | S# | Name | CarReg ----+-------------+------------+---------------+------+-------+------- P4 | 03/06 15:15 | 55 High St | Bathroom leak | S44 | Rob | ABK754 P1 | 04/06 11:10 | 47 High St | All ok | S44 | Rob | ABK754 P4 | 03/07 12:30 | 55 High St | All ok | S43 | Dave | ATS123 P1 | 05/07 15:00 | 47 High St | Broken window | S44 | Rob | ABK754 P1 | 05/07 15:00 | 47 High St | Leaking tap | S44 | Rob | ABK754 P2 | 13/07 12:00 | 12 High St | All ok | S42 | Peter | ATS123 P1 | 10/08 09:00 | 47 High St | Window fixed | S42 | Peter | ATS123 P3 | 11/08 14:00 | 99 High St | All ok | S41 | John | AAA001 P4 | 13/08 10:00 | 55 High St | All ok | S44 | Rob | ABK754 P3 | 05/09 11:15 | 99 High St | Bathroom leak | S42 | Peter | ATS123
State assumptions used in determining the fds.
❖ Exercise: Functional Dependencies (2) |
Consider this data (possibly from a spreadsheet):
id | name | subj | term | title | mark | grade --------+-----------------+----------+------+-----------------+------+------- 5123485 | Lillian Seng... | COMP9020 | 17s1 | Foundations ... | | AF 5123485 | Lillian Seng... | COMP9021 | 17s2 | Principles o... | 46 | FL 5123485 | Lillian Seng... | COMP9020 | 18s1 | Foundations ... | 44 | FL 5123485 | Lillian Seng... | COMP9021 | 18s2 | Principles o... | 46 | FL 5123504 | Yap Huawen | MATH5960 | 17s1 | Bayesian Inf... | 56 | PS 5123504 | Yap Huawen | COMP9318 | 17s1 | Data Warehou... | 76 | DN 5123504 | Yap Huawen | COMP9444 | 17s2 | Neural Netwo... | 53 | PS 5123504 | Yap Huawen | COMP9814 | 18s1 | Ext Artifici... | | NF 5123504 | Yap Huawen | COMP6714 | 18s2 | Info Retriev... | 86 | HD 5123523 | Cindy Nava G... | COMP9313 | 17s1 | Big Data Man... | 55 | PS 5123523 | Cindy Nava G... | COMP9444 | 17s2 | Neural Netwo... | 63 | PS 5123523 | Cindy Nava G... | COMP9417 | 18s1 | Machine Lear... | | AW 5123523 | Cindy Nava G... | COMP9318 | 19T1 | Data Warehou... | 43 | FL 5123523 | Cindy Nava G... | COMP6714 | 19T3 | Info Retriev... | | AW 5123523 | Cindy Nava G... | COMP9321 | 20T1 | Data Service... | | SY
Identify functional dependencies
❖ Exercise: Functional Dependencies (3) |
What functional dependencies exist in the following table:
A | B | C | D -----+-----+-----+----- 1 | a | 6 | x 2 | b | 7 | y 3 | c | 7 | z 4 | d | 6 | x 5 | a | 6 | y 6 | b | 7 | z 7 | c | 7 | x 8 | d | 6 | y
How is this case different to the previous ones?
❖ Functional Dependency |
Above, we considered dependency within a relation instance r(R).
More important for design is dependency across all possible instances of the relation (i.e. a schema-based dependency).
This is a simple generalisation of the previous definition:
E.g. course enrolment example
❖ Functional Dependency (cont) |
Can we generalise some ideas about functional dependency?
E.g. are there dependencies that hold for any relation?
❖ Inference Rules |
Armstrong's rules are general rules of inference on fds.
F1. Reflexivity e.g. X → X
F2. Augmentation e.g. X → Y ⇒ XZ → YZ
F3. Transitivity e.g. X → Y, Y → Z ⇒ X → Z
F4. Additivity e.g. X → Y, X → Z ⇒ X → YZ
F5. Projectivity e.g. X → YZ ⇒ X → Y, X → Z
F6. Pseudotransitivity e.g. X → Y, YZ → W ⇒ XZ → W
F1 - F3 are essential; the rest are useful to have
❖ Inference Rules (cont) |
Example: determining validity of AB → GH, given:
F = { AB → E, AG → J, BE → I, E → G, GI → H }
AB → E | (given) | |||
E → G | (given) | |||
AB → G | (using F3 on 1,2) | |||
AB → AB | (using F1) | |||
AB → B | (using F5 on 4) | |||
AB → BE | (using F4 on 1,5) | |||
BE → I | (given) | |||
AB → I | (using F3 on 6,7) | |||
AB → GI | (using F4 on 3,8) | |||
GI → H | (given) | |||
AB → H | (using F3 on 9,10) | |||
AB → GH | (using F4 on 3,11) |
❖ Closures |
Given a set F of fds, how many new fds can we derive?
For a finite set of attributes, there must be a finite set of derivable fds.
Closure = F+ = largest collection of dependencies derivable from F
Closures allow us to answer two interesting questions:
❖ Closures (cont) |
For the question "is X → Y derivable from F?" ...
R = ABC, F = { AB → C, C → B }
F+ = { A → A, AB → A,
AC → A, AB → B,
BC → B, ABC → B,
C → C, AC → C,
BC → C, ABC → C,
AB → AB, . . . . . . ,
AB → ABC, AB → ABC,
C → B, C → BC,
AC → B, AC → AB }
❖ Closures (cont) |
Algorithms based on F+ rapidly become infeasible.
To solve this problem ...
Given a set X of attributes and a set F of fds, the closure of X (denoted X+) is
We can prove (using additivity) that (X → Y) ∈ F+ iff Y ⊂ X+.
For computation purposes, |X+| is bounded by the number of attributes.
❖ Closures (cont) |
For the question "is X → Y derivable from F?" ...
❖ Exercise: Determining Keys |
Determine possible primary keys for each of the following:
❖ Normalisation |
Normalisation: branch of relational theory providing design insights.
Makes use of schema normal forms
Normalisation draws heavily on the theory of functional dependencies.
❖ Normal Forms |
Normalisation theory defines six normal forms (NFs).
1NF allows most redundancy; 5NF allows least redundancy.
We say that "a schema is in xNF", if all of its tables are xNF
For most practical purposes, BCNF (or 3NF) are acceptable NFs.
❖ Normal Forms (cont) |
Normal forms:
all attributes have atomic values we assume this as part of relational model, so every relation schema is in 1NF |
||
all non-key attributes fully depend on key (i.e. no partial dependencies) avoids much redundancy |
||
BCNF |
no attributes dependent on non-key attrs (i.e. no transitive dependencies) avoids most remaining redundancy |
❖ Normal Forms (cont) |
Boyce-Codd Normal Form (BCNF):
(which isn't necessarily a problem, unless FDs capture important semantics)
❖ Normal Forms (cont) |
Normalisation aims to puts a schema into xNF
Produced: 25 Oct 2023