COMP3311 Week 5 Monday Lecture

∧ >>

❖ Week 05 Monday

In this week's lectures ...

PLpgSQL, Aggregates, Triggers, DBMS↔PL

Things to do ...

Assignment 1 due before Friday 11:59pm
(60 submissions so far; 500 more to go)
Set up your PostgreSQL server
(550 students have /localstorage, X have PostgreSQL servers)
Check your assignment on vxdb2 before submitting

<< ∧ >>

❖ Submitting/Marking Assignment 1

Test on vxdb2 before submitting

Submit via: give cs3311 ass1 ass1.sql

Will be tested via:

dropdb TempDB1;  createdb TempDB1;  psql TempDB1 -f ass1.dump
psql TempDB1 -f ass1.sql # your submission
dropdb TempDB2;  createdb TempDB2;  psql TempDB2 -f ass1a.dump
psql TempDB2 -f ass1.sql # your submission
# run tests on TempDB1, then on TempDB2
# save testing output for automarking
dropdb TempDB1
dropdb TempDB2

<< ∧ >>

❖ Submitting/Marking Assignment 1 (cont)

Tutors will look at: ass1.sql, output from two test runs

Up to 2 marks for style; ugly, inconsistent code gets 0

Consistency of style is important; use SQL from lectures as guide

If view fails auto-test

can still get some, minimal marks for that question
no marks if the style is bad and too hard to read

<< ∧ >>

❖ Assignment Hints

Some ideas ...

hints: string_agg(), E'\n'
look up things in PostgreSQL documentation for more detail
write helper views that are useful in several questions
run extra queries to verify parts of your query or helper views
use raise notice for debugging PLpgSQL functions

"Unseen" database for auto-marking has same schema, different data

aims to check whether your solutions work in the general case

<< ∧ >>

❖ Assignment Hints (cont)

If you get this error ...

nw-syd-vxdb2 % psql ass1 -f ass1.sql
psql: error: ass1.sql: No such file or directory

you're in the wrong directory.

Try cd until ls shows ass1.sql

If you get this error ...

ass1=#
ERROR: already defined V

change create view to create or replace view

<< ∧ >>

❖ Developing Complex SQL Queries

Work backwards from the final query

if I had a table like R, the final query would be easy
(e.g. could use a pattern like select x from R where y = (select min(y) from R))
write a view to produce R
if I had table S, producing R would be easy
write a view to produce S
etc. etc.

Try to minmise the complexity of individual views

Use attribute names in the create or replace view statement
(easier to keep track of what results the view is producing for use in other views)

Cannot change number/types of view attributes without dropping view

<< ∧ >>

❖ Developing PLpgSQL Functions

Determine function signature

what args are passed in, what result(s) returned

For function producing a single result

develop a query to extract the data you need
combine the data to produce result value/tuple

For function producing a set of results

develop a query to extract useful data
iterate through tuples (for Rec in Query loop ...)
incrementally build the result set

<< ∧ >>

❖ Aggregates

Aggregates reduce a collection of values to a single value

Example:

select avg(mark) from Enrolments where course='COMP3311'

How to achieve this? ... Maintain state, update value-by-value

State = initial state
for each tuple T in query {
    # update State to include T
    State = updateState(State, T)
}
return makeFinal(State)

<< ∧ >>

❖ Aggregates (cont)

New aggregates are defined using CREATE AGGREGATE statement:

CREATE AGGREGATE AggName(BaseType) (
    stype     = StateType,
    initcond  = InitialValue,
    sfunc     = UpdateStateFunction,
    finalfunc = MakeFinalFunction
);

initcond (type StateType) is optional; defaults to NULL
finalfunc is optional; defaults to identity function

<< ∧ >>

❖ Exercise: User-defined Aggregates

make a product aggregate
```
select product(a) from R  →  6
```
simple string concatentation (comma-separated)
```
select cat(b) from R  →  'this,is,fun'
```

make your own count aggregate

select mycount(a) from R  →  3
select mycount(b) from R  →  3

Assuming

create table R (a integer, b text);
insert into R values (1,'this'), (2,'is'), (3,'fun'), (4,'eh?');

<< ∧ >>

❖ Global Constraints

Column and table constraints ensure validity of one table

Ref. integrity constraints ensure connections between tables are valid

Global constraints may involve conditions over many tables

Simple example (from banking domain):

-- accounts are held at branches
-- assets of branch is sum of balances of its accounts

for all Branches b
   b.assets == (select sum(acct.balance)
                from   Accounts acct
                where  acct.branch = b.location)

<< ∧ >>

❖ Global Constraints (cont)

SQL implementation of global constraints is ASSERTION

Example: #students in any UNSW course must be < 10000

create assertion ClassSizeConstraint check (
   not exists (
      select c.id
      from   Courses c
             join Enrolments e on (c.id = e.course)
      group  by c.id
      having count(e.student) > 9999
   )
);

Must be checked after every change to either Courses or Enrolments

Too expensive, so DBMSs provide triggers to do targetted checking.

<< ∧ >>

❖ Triggers

Triggers are

procedures stored in the database
invoked in response to database events (e.g. insert/update/delete)

Examples of uses for triggers:

maintaining summary data
checking schema-level constraints (assertions) on update
performing multi-table updates (to maintain assertions)

Triggers provide event-condition-action programming

<< ∧ >>

❖ Triggers (cont)

Sequence of activities during database update:

Reminder: BEFORE and UPDATE triggers can modify value of new tuple

<< ∧ >>

❖ Triggers in PostgreSQL

PostgreSQL triggers provide a mechanism for

INSERT, DELETE or UPDATE events
to automatically activate PLpgSQL functions

Syntax for PostgreSQL trigger definition:

CREATE TRIGGER TriggerName
{AFTER|BEFORE}  Event1 [OR Event2 ...]
ON TableName
[ WHEN ( Condition ) ]
FOR EACH {ROW|STATEMENT}
EXECUTE PROCEDURE FunctionName(args...);

BEFORE triggers must return old or new
any exception during a trigger forces roll-back

<< ∧ >>

❖ Exercise: Trigger Example (i)

Consider two tables

create table R (id integer, val text);
create table S (r integer references R(id), value text);

Write a trigger to check that the foreign key value is valid

when you insert a new S tuple
when you update an existing S tuple

<< ∧ >>

❖ Exercise: Trigger Example (ii)

Class enrolments:

Classes(id,name,room,day,start,end,quota,nstu)
ClassEnrolments(student,class)

Define triggers to maintain nstu for

insert into ClassEnrolments values (5012345, 6732);

insert into ClassEnrolments values (9999999, 99999);

delete from ClassEnrolments
where  student = 5012345 and class = 6732;

<< ∧ >>

❖ Programming with Databases

So far, we have seen ...

accessing data via SQL queries
packaging SQL queries as views/functions
building functions to return tables
implementing assertions via triggers

All of the above programming

is very close to the data
takes place inside the DBMS

For applications need a PL interacting with the DBMS.

<< ∧ >>

❖ Programming with Databases (cont)

Programming Language / DBMS archtecture:

<< ∧ >>

❖ Programming with Databases (cont)

Consider this (imaginary) PL/DBMS access method:

--  establish connection to DBMS
db = dbAccess("DB");
query = "select a,b from R,S where ... ";
--  invoke query and get handle to result set
results = dbQuery(db, query);
--  for each tuple in result set
while (tuple = dbNext(results)) {
    --  process next tuple
    process(tuple['a'], tuple['b']);
}

Estimated costs: dbAccess = 500ms, dbQuery = 100ms, dbNext < 1ms

<< ∧ >>

❖ Iteration over Data in PLs

Example: find mature-age students (e.g. 10000 students, 500 over 40)

# assume we have a DB handle
query = "select * from Student";
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
    if (tuple['age'] > 40) {
        --  process mature-age student
    }
}

We transfer 10000 tuples from DB, 9500 are irrelevant

Cost = 1 query + 10000 tuple fetches = 1*100 + 10000*1 = 10100 ms

<< ∧ >>

❖ Iteration over Data in PLs (cont)

Should be implemented as:

# assume we have a DB handle
query = "select * from Student where age >= 40";
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
    --  process mature-age student
}

Transfers only the 500 tuples that are needed.

Cost = 1 query + 500 tuple fetches = 1*100 + 500*1 = 600 ms

<< ∧ >>

❖ Iteration over Data in PLs (cont)

Example: find info about all marks for all students

# assume we have a DB handle
query1 = "select id,name from Student order by id";
res1 = dbQuery(db, query1);
while (tuple1 = dbNext(res1)) {
    query2 = "select course,mark from Marks"
             + " where student = " + tuple1['id'];
    res2 = dbQuery(db,query2);
    while (tuple2 = dbNext(res2)) {
        --  process student/course/mark info
    }
}

E.g. 10000 students, each with 8 marks, ⇒ run 10001 queries

Cost = 10001 queries + 80000 tuple fetches = 100*10001 + 1*80000 = 1080100 ms

<< ∧

❖ Iteration over Data in PLs (cont)

Should be implemented as:

# assume we have a DB handle
query = "select id,name,course,mark"
        + " from Student s join Marks m "
        + " on (s.id=m.student)"
        + " order by s.id"
results = dbQuery(db, query);
while (tuple = dbNext(results)) {
    --  process student/course/mark info
}

We invoke 1 query, and transfer same number of tuples.

Cost = 1 query + 80000 tuple fetches = 1*100 + 80000*1 = 80100 ms