COMP3311 Week 10 Monday Lecture
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [0/43]
In today's lecture ...
- Concurency, Database History, Course Review, Exam Preview
Things to do ...
- Assignment 2 now due midday Thursday (Nov 16)
(and no fiurther extensions based on you overloading vxdb2)
- Quiz 6 due by 23:59 Saturday (Nov 18)
- MyExperience course evaluation now open (7% response rate so far)
Coming Up ...
- no lecture on Wednesday
- will schedule pre-exam Help Sessions for coming weeks
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [1/43]
How to allocate courses to requirements:
- overall order: stream core, prog core, stream elective, prog elective, gened, free
- within groups, consider in order of Requirements.id
- does it appear in a core requirement, add to that requirement
- does it appear in an elective requirement (either directly or via a pattern)
and are the UOC already allocated < min_req, add to that requirement
- is the allocated gened UOC < max_req, add to the gened requirement
- if UOC already allocated to free < min_req, add to free requirement
- otherwise, cannot be allocated
Make a single pass through the courses in the transcript; allocate greedily
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [2/43]
Side-effects of this one-pass greedy allocation approach
- courses which could potentially be allocated to free, aren't
- some GEN##### course may end up in free
First problem could be alleviated by changing the allocation to
free
- allocate to free even if UOC for free ≥ min_req
Second problem is difficult to fix
- don't want to hold off allocating non-GEN courses to gened
- just about any course can be a gened these days
DO NOT implement these fixes for the assignment
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [3/43]
Transaction are application atomic operations, involing multiple DB ops
A transaction must always terminate, either:
- successfully (
COMMIT
), with all changes preserved
- unsuccessfully (
ABORT
), with database unchanged
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [4/43]
❖ Concurrency Control in SQL | |
Transactions in SQL are specified by
-
BEGIN
... start a transaction
-
COMMIT
... successfully complete a transaction
-
ROLLBACK
... undo changes made by transaction + abort
In PostgreSQL, some actions cause implicit rollback:
-
raise exception
during execution of a function
- returning null from a
before
trigger
PostgreSQL also provdes
ABORT
as a synonym for SQL standard
ROLLBACK
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [5/43]
❖ Concurrency Control in SQL (cont) | |
Concurrent access can be controlled via SQL:
- table-level locking: apply lock to entire table
- row-level locking: apply lock to just some rows
LOCK TABLE
explicitly acquires lock on an entire table.
Some SQL commands implicitly acquire appropriate locks, e.g.
-
ALTER TABLE
acquires an exclusive lock on table
-
UPDATE
, DELETE
acquire locks on affected rows
All locks are released at end of transaction
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [6/43]
Some locking in PG is implicit (e.g. changing schema)
Explicit locks are available:
lock table TableName [ in LockMode mode ]
Some possible LockModes:
-
SHARE
... simply reading the table
-
SHARE ROW EXCLUSIVE
... intend to update table
No
UNLOCK
... all locks are released at end of transaction
Row-level locking: lock all selected rows for writing
select * from Table where Condition for update
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [7/43]
❖ Locking in PostgreSQL (cont) | |
Two examples of lock usage in PostgreSQL:
begin;
lock table films in share mode;
select id into _id_ from films
where name = 'Star Wars: Episode I - The Phantom Menace';
-- Do rollback if no record was returned
insert into films_user_comments values
(_id_, 'GREAT! I was waiting for it for so long!');
commit;
begin;
lock table films in share row exclusive mode;
delete from films_user_comments where id in
(select id from films where rating < 5);
delete from films where rating < 5;
commit;
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [8/43]
Apply appropriate locking in the bank transfer transaction.
create or replace function
transfer(N integer, Src text, Dest text)
returns integer
declare
sID integer; dID integer; avail integer;
begin
select id,balance into sID,avail
from Accounts where name=Src;
...
return nextval('tx_id_seq');
end;
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [9/43]
❖ Locking and Performance | |
Locking reduces concurrency ⇒ lower throughput.
Granularity of locking can impact performance:
+
lock a small item ⇒ more of database accessible
+
lock a small item ⇒ quick update ⇒ quick lock release
-
lock small items ⇒ more locks ⇒ more lock management
Granularity levels: field, row (tuple), table, whole database
Many DBMSs support multiple lock-granularities.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [10/43]
❖ Multi-version Concurrency Control | |
One approach to reducing the requirement for locks is to
- provide multiple (consistent) versions of the database
- give each transaction access to an "appropriate" version
(i.e. a version that maintains the serializability of the transaction)
This approach is called
Multi-Version Concurrency Control.
Differences between MVCC and standard locking models:
- writing never blocks reading (make new version of tuple)
- reading never blocks writing (read old version of tuple)
PostgreSQL pioneered MVCC as a concurrency control mechanism
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [11/43]
❖ Multi-version Concurrency Control (cont) | |
PostgreSQL MVCC ...
- each tuple is tagged with (time of) tx that created/deleted it
- each tuple is linked to the next newer version of same tuple
Access to a tuple requires
- check most recent version which existed when tx started; use that version
Periodic
vacuum
process deletes tuples that
- are not accessible to any currently executing transactions
Time/space overheads in implementing MVCC
- are justified by reduced requirement for locking (⇒ more concurrency)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [12/43]
❖ A Brief History of Databases | |
Some important moments in the development of Databases ...
- 1960's ... large data stores based on linked network of nodes
- 1970 ... Ted Codd invents relational model
- 1975 ... System R from IBM ... first relational DBMS (incl.SQL)
- 1979 ... Oracle ships first version of its DBMS
- 1980's ... Stonebraker makes Postgres (some OO features)
- 1990's ... Object-oriented DBMSs, Access (DBMS?), PostgreSQL
- 1998 ... IBM releases SQL Server
- 2000's ... XML databases, then NoSQL databases
- 2010's ... DBMSs become more "distributed", also MongoDB
- recently ... eventual consistency, column stores, graph databases
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [13/43]
Core "database" goals:
- deal with very large amounts of data (terabytes, petabyes, ...)
- very-high-level languages (deal with big data in uniform ways)
- query optimisation (if evaluation too slow ⇒ useless)
At the moment
(and for the last 40 years) RDBMSs dominate ...
- simple, clean data model, backed up by theory
- high-level language for accessing data
- 50 years development work on RDB engine technology
RDBMSs work well in domains with uniform, structured data.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [14/43]
❖ Future of Database (cont) | |
Limitations/pitfalls of RDBMSs:
- NULL is ambiguous: unknown, not applicable, not supplied
- "limited" support for constraints/integrity and rules
- no support for uncertainty (data represents the state-of-the-world)
- data model too simple (e.g. limited support for complex objects, but PG)
- query model too rigid (e.g. no approximate match, but PG regexp)
- cotinually changing data sources not well-handled
- data must be "molded" to fit a single rigid schema
- database systems must be manually "tuned"
- do not scale well to some data sets (e.g. Google, Telco's)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [15/43]
❖ Future of Database (cont) | |
How to overcome (some of) these limitations?
Extend the relational model ...
- add new data types and query ops for new applications
- non-1NF data e.g. arrays, JSON ... both supported by PostgreSQL
- deal with uncertainty/inaccuracy/approximation in data
Replace the relational model ...
- object-oriented DBMS ... OO programming with persistent objects
- XML DBMS ... all data stored as XML documents, new query model
- application-effective data model (e.g. (key,value) pairs)
Performance: DBMSs that "tune" themselves ...
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [16/43]
Some modern applications have massive data sets (e.g. Google)
- far too large to store on a single machine/RDBMS
- query demands far too high even if could store in DBMS
Approach to dealing with such data
- distribute data over large collection of nodes (redundancy)
- provide computational mechanisms for distributing computation
Often this data does not need full relational selection
- represent data via (key,value) pairs
- unique key values can be used for addressing data
- values can be large objects (e.g. web pages, images, ...)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [17/43]
Popular computational approach to Big Data: map/reduce
- suitable for widely-distributed, very-large data
- allows parallel computation on such data to be easily specified
- distribute (map) parts of computation across network
- compute in parallel (possibly with further map'ing)
- merge (reduce) multiple results for delivery to requestor
Some Big Data proponents see no future need for SQL/relational ...
- depends on application (e.g. hard integrity vs eventual consistency)
Humour:
Parody of noSQL fans
(strong language warning)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [18/43]
❖ Object-relational Mapping | |
Pure OO databases came and went in the 90's (similarly XML databases)
A compromise:
- data stored in relational format
- "wrapper" gives the apperance of objects
Programmer works purely with objects; wrapper converts to SQL
Problems:
- OO query mechanisms are no better than SQL
- translation may lead to inefficient SQL
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [19/43]
DBMSs generally do precise matching (although like
/regexps)
Information retrieval systems do approximate matching.
E.g. documents containing these words (Google, etc.)
Also, introduce notion of "quality" of matching
(e.g. tuple T1 is a better match than tuple T2)
Quality also implies ranking of results.
Much activity in incorporating IR ideas into DBMS context.
Goal: support database exploration better.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [20/43]
Data which does not fit the "tabular model":
- image, video, music, text, ... (and combinations of these)
Research problems:
- how to specify queries on such data?
(image1 ≅ image2)
- how to "display" results?
(synchronize components)
Solutions to the first problem typically:
- extend notions of "matching"/indexes for querying
- require sophisticated methods for capturing data features
Sample query: find other songs
like this one?
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [21/43]
Multimedia/IR introduces approximate matching.
In some contexts, we have approximate/uncertain data.
E.g. witness statements in a crime-fighting database
"I think the getaway car was red ... or maybe orange ..."
"I am 75% sure that John carried out the crime"
Work by Jennifer Widom at Stanford on the Trio system
- extends the relational model (ULDB)
- extends the query language (TriQL)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [22/43]
❖ Stream Management Systems | |
Makes one addition to the relational model
- stream = infinite sequence of tuples, arriving one-at-a-time
Applications:
news feeds, telecomms, monitoring web usage, ...
RDBMSs: run a variety of queries on (relatively) fixed data
StreamDBs: run fixed queries on changing data (stream)
Approaches:
- window = relation formed from a stream via a rule
- stream data type = build new stream-specific operations
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [23/43]
Use graphs rather than tables as basic data structure tool.
Applications: complex data representation, via "flexible" objects
Implementing graphs in RDBMSs is possible, but often inefficient.
Graph nature of data changes query model considerably.
- different kinds of queries: kNN, neighbourhood, skyline
Research problem: query processing for large graphs
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [24/43]
Characteristics of dispersed databases:
- very large numbers of small processing nodes
- data is distributed/shared among nodes
Applications:
environmental monitoring devices, "intelligent dust", ...
Research issues:
- query/search strategies
(how to organise query processing)
- distribution of data
(trade-off between centralised and diffused)
Less extreme versions of this already exist:
- grid and cloud computing
- database management for mobile devices
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [25/43]
Disk-based data is slow to access, and page-at-a-time
- RBDMS techniques aim to minimise page read/writes
- methods: buffering, indexes, join algotithms
Non-volatile non-disk storage is becoming larger and cheaper
- many databases would fit in memory ... disk as a backup/archive
Research problems:
- how to structure data on SSD-like devices
- how to process queries in such an environment
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [26/43]
Open problems:
- database integration/federation
- making a collection of heterogeneous databases work together
- schema reconciliation is a difficult problem (vocab, schemas)
- replication
- maintain consistent copies of one database across multiple hosts
- how to propagate changes systematically?
- column-based storage
- row = collection of values for a given attribute
- column = collection of attributes for tuple
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [27/43]
COMP9312 Data Analytics for Graphs
- manipulating large-scale graph structures
COMP9313 Big Data Management
- handling data that won't fit in a DBMS on one machine
COMP9315 Database Systems Implementation
- comprehensive study of DBMS internals
COMP9319 Web Data Compression and Search
- compression and searching algorithms, and XML
COMP6714 Information Retrieval and Web Search
- finding information in unstructured text
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [28/43]
At the end of this course you should be able to:
- develop accurate, non-redundant data models;
- realise data models as relational database schemas;
- formulate queries via the full range of SQL constructs;
- use stored procedures and triggers to extend DBMS capabilities;
- write applications in Python that interact effectively with databases;
- understand the overall architecture of relational DBMSs;
- analyze performance issues in relational database applications;
- understand how queries are evaluated via relational algebra operations;
- understand the concepts behind transactions and concurrency control
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [29/43]
- Data modelling and database design
- Entity-relationship (ER) design, relational data model
- Relational theory
(algebra, dependencies, normalisation)
- Database application development
- SQL for querying, data definition and modification
(PostgreSQL's version)
- extending SQL
Queries, Functions, Aggregates, Triggers
- PostgreSQL,
psql (an SQL shell),
PLpgSQL (procedural SQL)
- SQLite,
sqlite3 (an SQL shell)
- Python3,
Psycopg2,
accessing data programmatically
- DBMS theory/technology
- relational algebra,
functional dependencies,
normalization
- performance tuning,
catalogues,
access control
- DBMS architecture,
query processing,
transaction processing
Things in
gray will definitely
not be examined.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [30/43]
Your final mark/grade will be determined as follows:
quizzes = mark for on-line quizzes (out of 12)
ass1 = mark for assignment 1 (out of 12)
ass2 = mark for assignment 2 (out of 16)
exam = mark for final exam (out of 60)
okExam = exam >= 25 (after scaling)
mark = quizzes + ass1 + ass2 + exam
grade = HD|DN|CR|PS if mark >= 50 && okExam
= FL if mark < 50 && okExam
= UF if !okExam
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [31/43]
Mon 4 Dec, 3-hour exam, morning and afternoon sessions
Morning: arrive by 10:00am, Afternoon: arrive by 1:20pm
Exam environment:
- invigilated, in the CSE labs, on CSE workstations
- closed exam environment, no Webcms3, no email, no web
- provided: questions, documentation, course notes
- no BYO laptop
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [32/43]
Questions ...
- 5 practical exercises, using PostgreSQL/Python3/Psycopg2
- 4 theory exercises, typed answers, like tute questions
What you have access to ...
- the exam paper (html), templates for all questions
- checking scripts for programming questions (like
autotest
)
- course notes, documentation for PostgreSQL/Python/Psycopg2
What you
do not have access to
- the class directory, your home directory
- Webcms3, email, messaging, etc.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [33/43]
When you login (zID/zPass) into the exam environment ...
- we create some working directories
- including templates and testing scripts
- we create a local PostgreSQL server for you
- we load the exam database into your PG server
- we start a web browser, pointing at the exam entry page
PostgreSQL server runs on your workstation
Database: will not be huge, will not be overly complex (e.g. < 15 tables)
You do not have access to VLab or /localstorage or /home/YOU
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [34/43]
For the prac questions ...
- files are contained in a
q?
directory
- including a template, test cases and a
check
script
For the theory questions ...
- each question has a text file template
q?.txt
General suggestions ...
- submit your answers using a
submit
script (e.g. submit q2
)
- give up on questions that are taking too long (e.g. > 30 mins for prac)
- submit questions as you complete them; can submit multiple times
- don't leave all submissions to the last minute; if late, not marked
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [35/43]
Want to know what are the questions on the exam ... ?
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [36/43]
Questions are as follows:
- an SQL query
- another SQL query
- a PLpgSQL function
- a Python3/Psycopg2 script
- another Python3/Psycopg2 script
- an analysis/synthesis question
- another analysis/synthesis question
- etc. etc. etc.
Note that the database for Q1-5 will be available in advance.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [37/43]
Sources for revision material:
- Lecture Material, Prac/Tute Exercises, Assignments
- Fundamentals of Database Systems,
Elmasri/Navathe
- Database System Concepts,
Silberschatz/Korth/Sudarshan
- Database Management Systems,
Ramakrishnan/Gehrke
- Database Systems: Complete Book,
Garcia-Molina/Ullman/Widom
- Database Systems: App-oriented,
Kifer/Bernstein/Lewis
- PostgreSQL/Python3/Psycopg2 Documentation
(to some extent)
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [38/43]
Supplementary Exams are only available to people who
- are absent from the Final Exam with good reason
(good = documented, serious, clearly affects ability to do exam)
- have performed well during the rest of the semester
If you are awarded a Supp Exam ...
- to be held in O-week of 24T1 (Mon 5 Feb)
- you must make yourself available for it
- non-attendance at the Supp ⇒ mark of 0 for the exam
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [39/43]
Assessment is about determining how well you understand the syllabus.
If you can't demonstrate your understanding, you won't pass.
In particular, I don't pass people just because ...
- please, please, ... my parents will be ashamed of me
- please, please, ... I tried really hard in this course
- please, please, ... this is my final course to graduate
- please, please, ... I'll be excluded if I fail COMP3311
- please, please, ... if I fail this, I can't do COMP93xx
- etc. etc. etc.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [40/43]
Of course, assessment isn't a "one-way street" ...
- we get to assess you in the final exam
- you get to assess us in the Course Evaluation
MyExperience evaluations are online (via MyUNSW) NOW
Several evaluations: course, lecturer, tutor
Telling us good things is ok.
Telling us things to improve is very useful.
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [41/43]
You need to learn for life, not just the exam.
In particular, learn to find answers for yourself.
Typically, no single correct answer.
(Solutions range from poor to excellent)
Take pride in your work.
(Aim for quality, not just correctness)
VScode and autotest will generally not be available in the workplace
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [42/43]
Good Luck with the Exams ... and Life ...
COMP3311 23T3 ♢ Week 10 Monday Lecture ♢ [43/43]
Produced: 13 Nov 2023