COMP3311 Week 7 Monday Lecture

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [0/28]
❖ Week 07 Monday


In today's lecture ...

Things to do ...

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [1/28]
❖ Rest of Term


Plan for remainder of term:

Week 7  Python, Psycopg2, PostgreSQL
Week 7/8  Relational Design, Normalization
Week 8/9  Relational Algebra, Query Execution
Week 9/10  Transactions, Concurrency Control
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [2/28]
❖ Assignment 2


Database about UNSW, called MyMyUNSW ...

Similar to SiMS, the database behind myUNSW

Goal: build a progression checker (no more waiting for the Nucleus)

Database schema info will eventually be in .../assignments/ass2/schema.php

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [3/28]
❖ Assignment 2 (cont)

Partial ER diagram of the MyMyUNSW database:

[Diagram:Pics/assignments/mymy-er.png]

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [4/28]
❖ Assignment 2 (cont)


Differences/examples from diagram:

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [5/28]
❖ Assignment 2 (cont)


How we are building the database ...

Database dump is > 20MB   (don't copy it to Vlab)

Database under pgsql/data/base directory is > 200 MB

Your quota on /localstorage is not infinite;  manage space carefully

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [6/28]
❖ Assignment 2 (cont)


What you get (as templates) and what you submit:

Put these under your VLab directory, not under /localstorage

Submit via give or Webcms3 before 11:59 on Friday 10 November

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [7/28]
❖ Keep in mind for Assignment 2


If 100's of people are finishing the assignment last minute ...



Beware:  /localstorage  is not backed up ... put only your DB there
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [8/28]
❖ Psycopg2 recap

Psycopg2 is a library for Python ↔ PostgreSQL interaction.

Where psycopg2 fits in the PL/DB architecture


[Diagram:Pics/pldb/pypsy-arch.png]

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [9/28]
❖ Psycopg2 recap (cont)

Common psycopg2 operations:

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [10/28]
❖ Psycopg2 recap (cont)

Typical database access pattern in Psycopg2

qry = 'select a,b,c from R where d = %s'
try:
   db = psycopg2.connect('dbname=mydb')
   cur = db.cursor()
   cur.execute(qry, [6])
   for tup in cur.fetchall():
      x,y,z = tup  # extract a,b,c values into x,y,z
      ... do something with x, y, z ...

except Exception as err:
  print("DB error: ", err)

finally:
  if db:
    db.close()

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [11/28]
❖ Psycopg2 Examples

Consider (a variation of) the MyMyUNSW database ...

People(id, zid, family, given, fullname, origin)
Students(id)
Staff(id, office, phone, employed, supervisor)

Orgunits(id, utype, name, unswid)
Terms(id, code, name, starting, ending)

Subjects(id, code, title, uoc, career, owner)
Courses(id, subject, term, homepage)
Streams(id, code, name, owner, stype)
Programs(id, code, name, uoc, owner, career, duration)

Program_enrolments(id, student, term, program, ...))
Stream_enrolments(part_of_prog_enr, stream)
Course_enrolments(student, course, mark, grade)

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [12/28]
❖ Exercise: Python/psycopg2 Scripts


Write a Python script that, given a partial student name

Usage examples:  ./stu 'smith',     ./stu 'nobody'


Write a Python script that, given a term code

Usage examples:  ./crs 17s2,     ./crs 19T1
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [13/28]
❖ Exercise: Python/psycopg2 Scripts (cont)


Write a Python script that, given a student ID

Usage examples:  ./progs 5124862,     ./progs 1234567


Write a Python script that, given a term code

Usage examples:  ./enrs 17s2,     ./enrs 19T1
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [14/28]
❖ Relational Design Theory


We know how to express ER data models and SQL schemas

But how do we know that our models/schemas are "good"?

Properties of "good" models/schemas

Relational design theory addresses the last issue

Relies on the notion of functional dependency

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [15/28]
❖ Relational Design Theory (cont)


Functional dependencies

What we study here: The aim of studying this:
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [16/28]
❖ Relational Design and Redundancy


In database design, redundancy is generally a "bad thing":

Our goal is to reduce redundancy in stored data
But ... redundancy may have some advantages
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [17/28]
❖ Relational Design and Redundancy (cont)

Consider the following relation defining bank accounts/branches:

accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Horseneck 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Minus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000

Careless updating of this data may introduce inconsistencies.

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [18/28]
❖ Exercise: Update Anomalies


Show what happens when the following changes occur:


How to ensure that the database is updated appropriately when a new account is added as above?

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [19/28]
❖ Update Anomalies

Insertion anomaly:

Update anomaly: Deletion anomaly:
Insertion/update anomalies are avoidable, but need extra DBMS work
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [20/28]
❖ Avoiding Update Anomalies

Redundancy in stored data can lead to update anomalies

Functional dependencies allow us to

Normalization methods allow us to
Note: most algorithms for normalization are non-deterministic
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [21/28]
❖ Notation/Terminology

Most texts adopt the following terminology:

Relation schemas upper-case letters, denoting set of all attributes (e.g. R, S, P, Q )
Relation instances lower-case letter corresponding to schema (e.g. r(R), s(S), p(P), q(Q) )
Tuples lower-case letters   (e.g. t, t', t1, u, v )
Attributes upper-case letters from start of alphabet (e.g. A, B, C, D )
Sets of attributes simple concatenation of attribute names (e.g. X=ABCD, Y=EFG )
Attributes in tuples tuple[attrSet] (e.g. t[ABCD], t[X])

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [22/28]
❖ Database Design (revisited)

To avoid update anomalies, normalization should give:

Normalization typically involves
Finally, each Ri  contains info about one entity (e.g. branch, customer, ...)
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [23/28]
❖ Database Design (revisited) (cont)

But we already have a design procedure (ER-then-relational)

Why do we need another design procedure?
  1. ER → Relational does not guarantee no redundancy
    • dependency theory allows us to check designs for residual problems
  2. the new procedure gives (semi)automated design
    • determine all of the attributes in the problem domain
    • collect them all together in a "universal relation"
    • provide information about how attributes are related
    • apply normalisation to decompose into non-redundant relations
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [24/28]
❖ Functional Dependency

A relation instance r(R) satisfies a dependency X → Y if

In other words, if two tuples in R agree in their values for the set of attributes X, then they must also agree in their values for the set of attributes Y.

We say that "Y is functionally dependent on X".

Attribute sets X and Y may overlap; trivially true that X → X.

Notes:

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [25/28]
❖ Identifying Functional Dependencies


Consider an instance r(R) of the relation schema R(ABCDE):

A       B       C       D       E      
a1 b1 c1 d1 e1
a2 b1 c2 d2 e1
a3 b2 c1 d1 e1
a4 b2 c2 d2 e1
a5 b3 c3 d1 e1

What dependencies can we observe among its attributes?

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [26/28]
❖ Identifying Functional Dependencies (cont)


Since A values are unique, the definition of fd gives:

Since all E values are the same, it follows that:
COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [27/28]
❖ Identifying Functional Dependencies (cont)


Other observations:

We could derive many other dependencies, e.g.   AE → BC, ...

In practice, choose a minimal set of fds (basis)

COMP3311 23T3 ♢ Week 7 Monday Lecture ♢ [28/28]


Produced: 24 Oct 2023