COMP9315 Introduction
COMP9315 DBMS Implementation | 1/129 |
Lecturer: John Shepherd
Web Site: http://www.cse.unsw.edu.au/~cs9315/
(If WebCMS unavailable, use http://www.cse.unsw.edu.au/~
Lecturer | 2/129 |
John Shepherd | ||
K17-410 (turn right from lift) | ||
jas@cse.unsw.edu.au | ||
still working out the details | ||
Information Extraction/Integration Information Retrieval/Web Search e-Learning Technologies Multimedia Databases Query Processing |
Course Admin | 3/129 |
cs9315@cse.unsw.edu.au | ||
Enrolment problems Special consideration Detailed assignment questions Technical issues |
What this Course is NOT | 4/129 |
Official course title: Database Systems Implementation
More accurate: Implementation of Database Engines
This is a course about
Course Goals | 5/129 |
Introduce you to:
... Course Goals | 6/129 |
A major course goal is to give you exposure to:
PostgreSQL is a good vehicle for this purpose, because:
... Course Goals | 7/129 |
At the end of this course you should understand:
Pre-requisites | 8/129 |
We assume that you are already familiar with
open, close, lseek, read, write, flock
... Pre-requisites | 9/129 |
How you might meet the pre-reqs ...
... via courses at CSE.
Syllabus Overview | 10/129 |
Textbooks | 11/129 |
No official text book; several are suitable ...
Teaching/Learning | 12/129 |
What's available for you:
Note: Lecture slides, exercises and videos will be available only after the lecture.
... Teaching/Learning | 13/129 |
Things that you need to do:
... Teaching/Learning | 14/129 |
Scheduled classes?
... Teaching/Learning | 15/129 |
The course web site site is where you can:
http://www.cse.unsw.edu.au/~cs9315/
If WebCMS is ever down, most material is accessible via:
http://www.cse.unsw.edu.au/~cs9315/22T1/index.php
Prac Work | 16/129 |
Prac Work requires you to compile PostgreSQL from source code
Sort out any problems ASAP (preferably at a consultation).
You can do prac work in groups, if you wish.
Assignments | 17/129 |
Schedule of assignment work:
Description | Due | Marks | ||||
Storage Management | Week 5 | 15% | ||||
Query Processing | Week 9 | 20% |
Assignments will be carried out individually
Ultimately, submission is via CSE's give
Will spend some time in lectures reviewing assignments.
Assignments will require up-front code-reading (see Pracs).
... Assignments | 18/129 |
More comments on assignments ...
You are responsible for managing your own time, and for committing enough time to complete your work in this course.
"Work pressure" is not an acceptable excuse for late assignments.
Plagiarism will be checked for and punished.
Slacking off and letting your partner do the work is unhelpful.
The exam will contain questions related to the assignment work.
Quizzes | 19/129 |
Over the course of the semester ...
But they contribute 15% of your overall mark for the course.
Exam | 20/129 |
There will be a three-hour exam in the May exam period.
Exam is held in CSE Labs (learn the environment, VLab) ... I wish
Exam is held online, as for last few terms.
Things that we can't reasonably test in the exam:
The exam will be a mixture of descriptive questions, quantitative analysis, and small programming exercises.
The exam contributes 50% of the overall mark for this course.
Supplementary Assessment Policy | 21/129 |
Everyone gets exactly one chance to pass the Exam.
If you attend the Exam
... Supplementary Assessment Policy | 22/129 |
All Special Consideration requests:
Passing this Course | 23/129 |
There is only one way to pass this course:
Pleading for a pass on compassionate grounds won't work.
Assessment Summary | 24/129 |
Your final mark/grade will be computed according to the following:
ass1 = mark for assignment 1 (out of 15) ass2 = mark for assignment 2 (out of 20) quiz = mark for on-line quizzes (out of 15) exam = mark for final exam (out of 50) okExam = exam > 20/50 (after scaling) mark = ass1 + ass2 + quiz + exam grade = HD|DN|CR|PS, if mark ≥ 50 && okExam = FL, if mark < 50 && okExam = UF, if !okExam
Reading Material | 25/129 |
All of these textbooks have relevant material (to varying depths):
... Reading Material | 26/129 |
Useful material can also be found in:
PostgreSQL | 27/129 |
In this course, we will be using PostgreSQL v14.1 (compulsory)
PostgreSQL tarball is available for copying from the course web site.
(Don't waste your IP quota downloading it from www.postgresql.org)
Install/modify your own PostgreSQL server at CSE (from source code).
(This is not the same setup as for COMP3311; COMP3311 used a shared binary)
Working on a home PC is simple if you run Linux or Mac OSX.
Alternatively, use VLab to connect to CSE from home.
However, you can do it all on Windows (see Chap.17 PostgreSQL manual)
... PostgreSQL | 28/129 |
PostgreSQL is a large software system:
You will need to learn to navigate this code effectively.
Will discuss relevant parts in lectures to help with this.
... PostgreSQL | 29/129 |
Some comments on PostgreSQL books:
Relational Database Revision |
Relational Databases | 31/129 |
Relational databases build on relational theory
... Relational Databases | 32/129 |
We begin by looking at relational databases top-down
Database Management Systems | 33/129 |
Relational DBMSs provide critical infrastructure for modern computing
DBMS History | 34/129 |
Files, Hierachical and network databases | ||
Relational data model (Ted Codd) | ||
First RDBMS and SQL (IBM Almaden) | ||
First version of Oracle | ||
Refinement of technology, distributed systems, new data types (objects, Prolog) | ||
Object-relational DBMSs, OLAP, data mining, data warehousing, multimedia data, SQL standards | ||
Databases for XML, bioinformatics, telecomms, SQL3 | ||
also, very-large, distributed, relaxed-consistency storage |
DBMS Functionality | 35/129 |
DBMSs provide a variety of functionalities:
DBMS for Data Definition | 36/129 |
Critical function of DBMS: defining relational data (DDL sub-language)
Relational data: relations/tables, tuples, values, types, constraints.
E.g.
create domain WAMvalue float check (value between 0.0 and 100.0); create table Students ( id integer, -- e.g. 3123456 familyName text, -- e.g. 'Smith' givenName text, -- e.g. 'John' birthDate date, -- e.g. '1-Mar-1984' wam WAMvalue, -- e.g. 85.4 primary key (id) );
Executing the above adds meta-data to the database.
DBMSs typically store meta-data as special tables (catalog).
... DBMS for Data Definition | 37/129 |
Input: DDL statements
Result: meta-data in catalog is modified
... DBMS for Data Definition | 38/129 |
Specifying constraints is an important aspect of data definition:
create table Employee ( id integer primary key, name varchar(40), salary real, age integer check (age > 15), worksIn integer references Department(id), constraint PayOk check (salary > age*1000) );
DBMS for Data Modification | 39/129 |
Critical function of DBMS: manipulating data (DML sub-language)
insert
delete
update
insert into Enrolments(student,course,mark) values (3312345, 5542, 75); update Enrolments set mark = 77 where student = 3354321 and course = 5542; delete Enrolments where student = 331122333;
... DBMS for Data Modification | 40/129 |
Input: DML statements
Result: tuples are added, removed or modified
... DBMS for Data Modification | 41/129 |
Most DBMSs also provide bulk download/upload mechanisms:
dump
load
pg_dump
copy
DBMS as Query Evaluator | 42/129 |
Most common function of relational DBMSs
select s.id, c.code, e.mark from Students s join Enrolments e on s.id = e.student join Courses c on e.course = c.id;
... DBMS as Query Evaluator | 43/129 |
Input: SQL query
Output: table (displayed as text)
DBMS Architecture | 44/129 |
The aim of this course is to
Practical reason:
... DBMS Architecture | 45/129 |
Fundamental tenets of DBMS architecture:
Modern DBMSs interact with storage via the O/S file-system.
** SSDs change things a little, but most high volume bulk storage still on disks
... DBMS Architecture | 46/129 |
Implementation of DBMS operations is complicated by
In practice, may need new "concurrency-tolerant" data structures.
Transactions/reliability require some form of logging.
... DBMS Architecture | 47/129 |
Path of a query through a typical DBMS:
... DBMS Architecture | 48/129 |
Accoring to Silberschatz/Korth/Sudarshan (SKS) ...
... DBMS Architecture | 49/129 |
Accoring to Elmasri/Navathe (EN) ...
... DBMS Architecture | 50/129 |
According to Ramakrishnan/Gerhke (RG) ...
... DBMS Architecture | 51/129 |
translates queries into efficient sequence of relational ops | ||
controls execution of sequence of relational ops | ||
basis for implementation of relational operations | ||
manages data transfer between disk and main memory | ||
manages allocation of disk space and data structures | ||
controls concurrent access to database | ||
ensures consistent database state after system failures | ||
verifies integrity constraints and user privileges |
Database Engine Operations | 52/129 |
DB engine = "relational algebra virtual machine":
projection (π) | join (⋈) | |||
intersection (∩) | difference (-) | |||
group | aggregate |
For each of these operations:
... Database Engine Operations | 53/129 |
Different implementations of Selection:
select * from Students where id = 3312345;
where id
select * from Employees where age > 55;
Relational Algebra | 54/129 |
Relational algebra (RA) can be viewed as ...
... Relational Algebra | 55/129 |
Select, project, join provide a powerful set of operations for constructing relations and extracting relevant data from them.
Adding set operations and renaming makes RA complete.
Notation | 56/129 |
Standard treatments of relational algebra use Greek symbols.
We use the following notation (because it is easier to reproduce):
Standard Notation |
Our Notation |
|||
σexpr(Rel) | Sel[expr](Rel) | |||
πA,B,C(Rel) | Proj[A,B,C](Rel) | |||
Rel1 ⋈expr Rel2 | Rel1 Join[expr] Rel2 | |||
schemaRel | Rename[schema](Rel) |
For other operations (e.g. set operations) we adopt the standard notation.
... Notation | 57/129 |
We define the semantics of RA operations using
... Notation | 58/129 |
All RA operators return a result relation (no DB updates).
For convenience, we can name a result and use it later.
E.g.
Temp = R op1 S op2 T Res = Temp op3 Z-- which is equivalent to Res = (R op1 S op2 T) op3 Z
Each "intermediate result" has a well-defined schema.
Sample Relations | 59/129 |
Example database #1 to demonstrate RA operators:
... Sample Relations | 60/129 |
Example database #2 to demonstrate RA operators:
Selection | 61/129 |
Selection returns a subset of the tuples in a relation r that satisfy a specified condition C.
C is a boolean expression on attributes in R.
Result size: |σC(r)| ≤ |r|
Result schema: same as the schema of r (i.e. R)
Computational view:
result = {} for each tuple t in relation r if (C(t)) { result = result ∪ {t} }
... Selection | 62/129 |
Example selections:
... Selection | 63/129 |
Example queries:
Projection | 64/129 |
Projection returns a set of tuples containing a subset of the attributes in the original relation.
X specifies a subset of the attributes of R.
Note that removing key attributes can produce duplicates.
In RA, duplicates are removed from the result set.
(In many RDBMS's, duplicates are retained (i.e. they use bag, not set, semantics))
Result size: |πX(r)| ≤ |r| Result schema: R'(X)
Computational view:
result = {} for each tuple t in relation r result = result ∪ {t[X]}
... Projection | 65/129 |
Example projections:
... Projection | 66/129 |
Example queries:
Union | 67/129 |
Union combines two compatible relations into a single relation via set union of sets of tuples.
Compatibility = both relations have the same schema
Result size: |r1 ∪ r2| ≤ |r1| + |r2| Result schema: R
Computational view:
result = r1 for each tuple t in relation r2 result = result ∪ {t}
... Union | 68/129 |
Example queries:
Intersection | 69/129 |
Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.
Uses same notion of relation compatibility as union.
Result size: |r1 ∪ r2| ≤ min(|r1|,|r2|) Result schema: R
Computational view:
result = {} for each tuple t in relation r1 if (t ∈ r2) { result = result ∪ {t} }
... Intersection | 70/129 |
Example queries:
Difference | 71/129 |
Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
Uses same notion of relation compatibility as union.
Note: tuples in r2 but not r1 do not appear in the result
i.e. set difference != complement of set intersection
Computational view:
result = {} for each tuple t in relation r1 if (!(t ∈ r2)) { result = result ∪ {t} }
... Difference | 72/129 |
Example difference:
s1 = Sel [B = 1] (r1)
s2 = Sel [C = x] (r1)
s1 - s2
s2 - s1
... Difference | 73/129 |
Example queries:
Natural Join | 74/129 |
Natural join is a specialised product:
The natural join of relations r(R) and s(S) is defined as:
where match = t1[K] = t2[K] ∧ t1[L] = t2[L] ∧ t1[M] = t2[M]
Computational view:
result = {} for each tuple t1 in relation r for each tuple t2 in relation s if (matches(t1,t2)) result = result ∪ {combine(t1,t2)}
... Natural Join | 75/129 |
Natural join can also be defined in terms of other relational algebra operations:
We assume that the union on attributes eliminates duplicates.
If we wish to join relations, where the common attributes have different names, we rename the attributes first.
E.g. R(ABC) and S(DEF) can be joined by
Note: |r ⋈ s| ≪ |r × s|, so join not implemented via product.
... Natural Join | 76/129 |
Example natural join:
... Natural Join | 77/129 |
Example queries:
Theta Join | 78/129 |
The theta join is a specialised product containing only pairs that match on a supplied condition C.
Examples: (r1 Join[B>E] r2) ... (r1 Join[E<D ∧ C=G] r2)
Can be defined in terms of other RA operations:
Unlike natural join, "duplicate" attributes are not removed.
Note that r ⋈true s = r × s.
... Theta Join | 79/129 |
Example theta join:
... Theta Join | 80/129 |
Comparison between join operations:
Outer Join | 81/129 |
r Join s eliminates all s tuples that do not match some r tuple.
Sometimes, we wish to keep this information, so outer join
... Outer Join | 82/129 |
Example outer join:
Contrast this to the result for theta-join presented earlier.
... Outer Join | 83/129 |
There are three variations of outer join R OuterJoin S:
If we want to know about all Branches, regardless of whether they have Customers as their homeBranch:
... Outer Join | 84/129 |
Computational description of r(R) LeftOuterJoin s(S):
result = {} for each tuple t1 in relation r nmatches = 0 for each tuple t2 in relation s if (matches(t1,t2)) result = result ∪ {combine(t1,t2)} nmatches++ if (nmatches == 0) result = result ∪ {combine(t1,Snull)}
where Snull is a tuple from S with all atributes set to NULL.
Aggregation | 85/129 |
Two types of aggregation are common in database queries:
Generalised Projection | 86/129 |
In standard projection, we select values of specified attributes.
In generalised projection we perform some computation on the attribute value before placing it in the result tuple.
Examples:
PostgreSQL |
PostgreSQL | 88/129 |
PostgreSQL is a full-featured open-source (O)RDBMS.
Brief History of PostgreSQL | 89/129 |
Ingres (Stonebraker) research prototype → Relational Technologies → bought by Computer Associates |
||
Postgres (Stonebraker) research prototype → Illustra → bought by Informix |
||
Postgres95 (Chen, Yu) added SQL, spawned PostgreSQL |
||
PostgreSQL (Momjian,Lane,...) open-source DBMS with Oracle-level functionality, platform for experiments with new DBMS implementation ideas |
PostgreSQL Online | 90/129 |
Web site: www.postgresql.org
Key developers: Bruce Momjian, Tom Lane, Marc Fournier, ...
Full list of developers: www.postgresql.org/developer/bios
Local copy of source code:
/home/cs9315/web/20T1/postgresql/src.tar.bz2
Documentation is available via WebCMS menu.
User View of PostgreSQL | 91/129 |
Users interact via SQL in a client process, e.g.
$ psql webcms SET psql (11.3) Type "help" for help. webcms2=# select * from calendar; id | course | evdate | event ----+--------+------------+--------------------------- 1 | 4 | 2001-08-09 | Project Proposals due 10 | 3 | 2001-08-01 | Tute/Lab Enrolments Close 12 | 3 | 2001-09-07 | Assignment #1 Due (10pm) ...
or
$dbconn = pg_connect("dbname=webcms"); $result = pg_query($dbconn,"select * from calendar"); while ($tuple = pg_fetch_array($result)) { ... $tuple["event"] ... }
PostgreSQL Functionality | 92/129 |
PostgreSQL systems deal with various kinds of entities:
... PostgreSQL Functionality | 93/129 |
PostgreSQL's dialect of SQL is mostly standard (but with extensions).
Differences visible at the user-level:
create view myview as select * from mytab;-- is implemented as create type as myview (same fields as mytab ); create rule myview as on select to myview do instead select * from mytab;
... PostgreSQL Functionality | 94/129 |
PostgreSQL stored procedures differ from SQL standard:
void
create or replace function barsIn(suburb text) returns setof Bars as $$ declare r record; begin for r in select * from Bars where location = suburb loop return next r; end loop; end; $$ language plpgsql;used as e.g. select * from barsIn('Randwick');
... PostgreSQL Functionality | 95/129 |
Concurrency is handled via multi-version concurrency control (MVCC)
vacuum
... PostgreSQL Functionality | 96/129 |
Allows transactions to specify a consistency level for concurrency
... PostgreSQL Functionality | 97/129 |
PostgreSQL has a well-defined and open extensibility model:
... PostgreSQL Functionality | 98/129 |
Because of its extensibility, PostgreSQL has extra data types:
Installing PostgreSQL | 99/129 |
PostgreSQL is available via the COMP9315 web site.
Provided as tarball and zip in ~cs9315/web/20T1/postgresql/
Brief summary of installation:
$ configure --prefix=~/your/pgsql/directory $ make $ make install $ source ~/your/environment/file# set up environment variables $ initdb# set up postgresql configuration $ pg_ctl start# do some work with PostgreSQL databases $ pg_ctl stop
... Installing PostgreSQL | 100/129 |
Simplified version of running the server:
$ source ~/your/environment/file# set up environment variables $ pgs setup# runs initdb and fixes configuration $ pgs start# do some work with PostgreSQL databases $ pgs stop# stops server $ pgs cleanup# removes all data files
pgs
/home/cs9315/bin
PostgreSQL Configuration | 101/129 |
PostgreSQL configuration parameters (some important ones):
PGHOME
configure --prefix=$PGHOME
PGBIN
PGDATA
PGHOST
PGPORT
PGHOST
... PostgreSQL Configuration | 102/129 |
A typical envrionment setup for COMP9315:
# Set up environment for running PostgreSQL # Must be "source"d from sh, bash, ksh, ... PGHOME=/home/jas/srvr/pgsql export PGDATA=$PGHOME/data export PGHOST=$PGDATA export PGPORT=5432 export PATH=$PGHOME/bin:$PATH export PGDATA PGHOST PATH alias p0="$D/bin/pg_ctl stop" alias p1="$D/bin/pg_ctl -l $PGDATA/log start"
... PostgreSQL Configuration | 103/129 |
Other configuration files live in $PGDATA
postgresql.conf
unix_socket_directory
$PGHOST
max_connections
pg_hba.conf
Using PostgreSQL for Assignments | 104/129 |
You will need to modify then re-start the server:
# edit source code to make changes $ pg_ctl stop $ make $ make install# restore postgresql configuration $ pg_ctl start# run tests and analyse results
Assumes no changes that affect storage structures.
I.e. existing databases will continue to work ok.
... Using PostgreSQL for Assignments | 105/129 |
If you change storage structures ...
initdb
# edit source code to make changes $ pg_dump testdb > testdb.dump $ make $ pg_ctl stop $ rm -fr /your/pgsql/directory/data $ make install $ initdb# restore postgresql configuration $ pg_ctl start $ createdb testdb $ psql testdb -f testdb.dump# run tests and analyse results
Need to save a copy of postgresql.conf
PostgreSQL Architecture | 106/129 |
Client/server architecture:
Note: nowadays the postmaster
postgres
... PostgreSQL Architecture | 107/129 |
Notes:
... PostgreSQL Architecture | 108/129 |
Memory/storage architecture:
... PostgreSQL Architecture | 109/129 |
Notes:
... PostgreSQL Architecture | 110/129 |
File-system architecture:
... PostgreSQL Architecture | 111/129 |
Interesting files in $PGDATA:
which server version made this directory | ||
who can access which databases from where | ||
server parameters (e.g. max connections) | ||
how was current postmaster invoked | ||
process id of current postmaster |
PostgreSQL Source Code |
PostgreSQL Source Code | 113/129 |
Top-level of PostgreSQL distribution contains:
... PostgreSQL Source Code | 114/129 |
How to get started understanding the workings of PostgreSQL:
... PostgreSQL Source Code | 115/129 |
The source code directory (src) contains:
... PostgreSQL Source Code | 116/129 |
We introduce the code
src/tools/backend/index.html
Life-cycle of a PostgreSQL query | 117/129 |
How a PostgreSQL query is executed:
... Life-cycle of a PostgreSQL query | 118/129 |
PostgreSQL server | 119/129 |
PostgresMain(int argc, char *argv[], ...)
src/backend/tcop/postgres.c
postgres
Q
X
... PostgreSQL server | 120/129 |
As well as handling SQL queries, PostgresqlMain
CREATE TABLE
CREATE
vacuum
COPY
COPY
PostgreSQL Data Types | 121/129 |
Data types defined in *.h
src/include/
Two important data types: Node
List
Node
src/include/nodes/nodes.h
src/include/nodes/*.h
src/backend/nodes/*.c
Node
List
src/include/nodes/pg_list.h
src/backend/nodes/list.c
PostgreSQL Query Evaluation | 122/129 |
exec_simple_query(const char *query_string)
src/backend/tcop/postgres.c
query_string
... PostgreSQL Query Evaluation | 123/129 |
pg_parse_query(char *sqlStatements)
src/backend/tcop/postgres.c
pg_analyze_and_rewrite(Node *parsetree, ...)
src/backend/tcop/postgres.c
... PostgreSQL Query Evaluation | 124/129 |
Each query is represented by a Query
src/include/nodes/parsenodes.h
TargetEntry
RangeTblEntry
where
FromExpr
SortGroupClause
... PostgreSQL Query Evaluation | 125/129 |
pg_plan_queries(querytree_list, ...)
src/backend/tcop/postgres.c
pg_plan_query()
Query
src/backend/tcop/postgres.c
planner()
optimizer/plan/planner.c
... PostgreSQL Query Evaluation | 126/129 |
Each executable query is represented by a PlannedStmt
src/include/nodes/plannodes.h
Plan
Plan
SeqScan
IndexScan
HashJoin
Sort
Plan
... PostgreSQL Query Evaluation | 127/129 |
PlannedStmt *planner(Query *parse, ...)
optimizer/plan/planner.c
subquery_planner()
... PostgreSQL Query Evaluation | 128/129 |
Queries run in a Portal
Plan
Plan
QueryDesc
TupleDesc
atStart
Portal
src/include/utils/portal.h
PortalRun()
... PostgreSQL Query Evaluation | 129/129 |
How query evaluation happens in exec_simple_query()
PlannedStmt
PlannedStmt
Portal
PlannedStmt
portal
CommandDest
PortalRun(portal,...,dest,...)
PortalRun...()
ProcessQuery(plan,...)
ProcessQuery()
QueryDesc
plan
ExecutorRun(qdesc,...)
ExecutorRun()
ExecutePlan()
Produced: 5 Feb 2022