COMP3311 24T2

Week 07
Constraints, Triggers, and Aggregates

Consider a schema for an organisation
```
Employee(id:integer, name:text, works_in:integer, salary:integer, ...)
Department(id:integer, name:text, manager:integer, ...)
```
Where works_in is a foreign key indicating which Department an Employee works in, and manager is a foreign key indicating which Employee is the manager of a Department.

A manager must work in the Department they manage. Can this be checked by standard SQL table constraints? If not, why not?

If it cannot be checked by standard SQL constraints, write an assertion to ensure that each manager also works in the department they manage. Define using a standard SQL assertion statement like:
```
create assertion manager_works_in_department
check  ...
```
Answer:

Except for foreign key constraints, the standard SQL table constraints cannot be used to define conditions which involve multiple tables. SQL assertions allow this, but are not typically implemented because of the overhead of checking them on each change to the relevant tables. Triggers provide a way of implementing this kid of checking, but also allow updates to ensure that the assertion conditions hold.

The following would be a suitable assertion:
```
create assertion manager_works_in_department
check  ( not exists (
           select *
           from   Employee e
                  join Department d on (d.manager = e.id)
           where  e.works_in <> d.id
         )
       );
```

Using the same schema from the previous question, write an assertion to ensure that no employee in a department earns more than the manager of their department. Define using a standard SQL assertion statement like:

create assertion employee_manager_salary
check  ...

Answer:

The following is suitable:

create assertion employee_manager_salary
check  ( not exists (
           select *
           from   Employee emp
                  join Department dept on (dept.id = emp.works_in)
                  join Employee mgr on (dept.manager = mgr.id)
           where  emp.salary > mgr.salary
         )
       );

What is the SQL command used in PostgreSQL to define a trigger? And what is the SQL command to remove it?

Answer:

Trigger functions are defined using the SQL CREATE FUNCTION command. A trigger function should be defined as accepting no arguments, and returns a value of the special TRIGGER data type. The syntax for defining a PLpgSQL function is:

CREATE FUNCTION function_name () RETURNS trigger
AS $$
   DECLARE
      declarations;
      [...]
   BEGIN
      statements;
      [...]
   END;
$$ LANGUAGE plpgsql;

A trigger is defined with the SQL CREATE TRIGGER command with syntax as:

CREATE TRIGGER trigger_name
               BEFORE operation
               ON table_name FOR EACH ROW
               EXECUTE PROCEDURE function_name();

CREATE TRIGGER trigger_name
               AFTER operation
               ON table_name FOR EACH ROW
               EXECUTE PROCEDURE function_name();

where operation is one of INSERT or DELETE or UPDATE

You can assign one trigger to several operations via

(BEFORE|AFTER) op₁ OR op₂ OR ...

Triggers are removed via the SQL statement:

DROP TRIGGER trigger_name ON table_name;

Note that this does not remove any past effects caused by the trigger.

Trigers can be defined as BEFORE or AFTER. What exactly are they before or after?
Answer:
Changes are made to the database as follows:
- fire all BEFORE triggers, possibly modifying any new tuple
- do all standard SQL constraint checking (e.g. FKs, domains)
- file all AFTER triggers, possibly updating other tables
Note that trigger functions can raise exceptions, which would would cause the change to be aborted and rolled back.
Give examples of when you might use a trigger BEFORE and AFTER ...
1. an insert operation
  Answer:
  - a trigger before an insert might check for valid values of the fields (e.g. referential integrity checks), or perhaps generate additional values, such as timestamps, to be included in the newly-inserted tuple
  - a trigger after an insert might perform additional database updates to ensure semantic consistency of the database, such as enforcing inter-table dependencies (e.g. installing a count of tuples in one relation into another)
2. an update operation
  Answer:
  - a trigger before an update might check for valid values of the modified fields, or generate a new timestamp to be included in the modified tuple
  - a trigger after an update might do similar maintenance of database consistencies as an insert trigger
3. a delete operation
  Answer:
  - a trigger before a delete might check referential integrity constraints (e.g. can't delete a tuple because it has tuples in other relations referring to it)
  - a trigger after a delete might do similar maintenance of database consistencies as an insert/update trigger

Consider the following relational schema:

create table R(a int, b int, c text, primary key(a,b));
create table S(x int primary key, y int);
create table T(j int primary key, k int references S(x));

State how you could use triggers to implement the following constraint checking (hint: revise the material on Constraint Checking from the Relational Data Model and ER-Relational Mapping extended notes)

primary key constraint on relation R

Answer:

Note: in the solution below, TG_OP is a special variable that tells the trigger function which operation caused it to be invoked. This is useful when a trigger is defined to act on more than one type of operation (as in the triggers below).

create trigger R_pk_check before insert or update on R
for each row execute procedure R_pk_check();

create function R_pk_check() returns trigger
as $$
begin
	if (new.a is null or new.b is null) then
		raise exception 'Partially specified primary key for R';
	end if;
	if (TG_OP = 'UPDATE' and old.a=new.a and old.b=new.b) then
		return;
	end if;
    -- not UPDATE, so must be INSERT
	select * from R where a=new.a and b=new.b;
	if (found) then
		raise exception 'Duplicate primary key for R';
	end if;
end;
$$ language plpgsql;

foreign key constraint between T.j and S.x

Answer:

create trigger T_fk_check before insert or update on T
for each row execute procedure T_fk_check();

create function T_fk_check() returns trigger
as $$
begin
	select * from S where x=new.k;
	if (not found) then
		raise exception 'Non-existent S.x key in T';;
	end if;
end;
$$ language plpgsql;

-- assuming that we do *not* want "on delete cascade" semantics

create trigger S_refs_check before delete or update on S
for each row execute procedure S_refs_check();

create function S_refs_check() returns trigger
as $$
begin
	if (TG_OP = 'UPDATE' and old.x=new.x) then
		return;
	end if;
	select * from T where k=old.x;
	if (found) then
		raise exception 'References to S.x from T';;
	end if;
end;
$$ language plpgsql;

Explain the difference between these triggers
```
   create trigger updateS1 after update on S
   for each row execute procedure updateS();

   create trigger updateS2 after update on S
   for each statement execute procedure updateS();
```
when executed with the following statements. Assume that S contains primary keys (1,2,3,4,5,6,7,8,9).
1. update S set y = y + 1 where x = 5;
  
  Answer:
  
  There is no effective difference in the action of the two triggers for this case. The update only effects one tuple, and so each trigger is activated once for that tuple.
2. update S set y = y + 1 where x > 5;
  
  Answer:
  
  Trigger updateS1 will cause updateS() to be executed on each of the affected tuples (the ones with primary key values (6,7,8,9)).
  
  Trigger updateS2 will cause updateS() to be executed once, after all of the affected tuples have been modified, but before the updates have been committed.
  
  In both cases, if the function fails, none of the updates will take place (i.e. they are not committed).

What problems might be caused by the following pair of triggers?

create trigger T1 after insert on Table1
for each row execute procedure T1trigger();

create trigger T2 after update on Table2
for each row execute procedure T2trigger();

create function T1trigger() returns trigger
as $$
begin
update Table 2 set Attr1 = ...;
end; $$ language plpgsql;

create function T2trigger() returns trigger
as $$
begin
insert into Table1 values (...);
end; $$ language plpgsql;

Answer:

The problem with these triggers occurs on either an insert on Table1 or an update on Table2. When either trigger is activated, it unconditionally executes an SQL statement that will activate the other trigger, leading to an infinite sequence of trigger activations.

Given a table:

   Emp(empname:text, salary:integer, last_date:timestamp, last_usr:text)

Define a trigger that ensures that any time a row is inserted or updated in the table, the current user name and time are stamped into the row. The trigger should also ensure that an employee's name is given and that the salary has a positive value.

The two PostgreSQL builtin functions user() and now() will provide the values that you need for the stamp.

Answer:

drop if exists table emp;
create table emp (
    empname text primary key,
    salary integer,
    last_date timestamp,
    last_user text

create or replace function emp_stamp () returns trigger
as $$
begin
    -- Check that empname and salary are given
    if new.empname is null then
        raise exception 'empname cannot be NULL value';
    end if;
    if new.salary is null then
        raise exception '% cannot have NULL salary', new.empname;
    end if;

    -- Who would works if they had to pay to do it?
    if new.salary < 0 then
        raise exception '% cannot have a negative salary', new.empname;
    end if;

    -- Remember who changed the payroll when
    new.last_date := now();
    new.last_user := user();
    return new;
end;
$$ language plpgsql;

create trigger emp_stamp before insert or update on emp
    for each row execute procedure emp_stamp();

If these were used in a sample database called mydb, with an initially empty emp table, the effect would appear like this:

mydb=> insert into emp values ('John',50000);
INSERT 478005 1
mydb=> select * from emp; 
 empname | salary |           last_date           | last_user 
---------+--------+-------------------------------+-----------
 John    |  50000 | 2002-09-16 17:30:38.613018+10 | jas
(1 row)

mydb=> update emp set salary=60000 where empname='John';
UPDATE 1
mydb=> select * from emp; 
 empname | salary |           last_date           | last_user 
---------+--------+-------------------------------+-----------
 John    |  60000 | 2002-09-16 17:31:33.141904+10 | jas
(1 row)

Consider the following relational schema:

   Enrolment(course:char(8), sid:integer, mark:integer)
   Course(code:char(8), lic:text, quota:integer, numStudes:integer);

Define triggers which keep the numStudes field in the Course table consistent with the number of enrolment records for that course in the Enrolment table, and also ensure that new enrolment records are rejected if they would cause the quota to be exceeded.

Answer:

create or replace function ins_stu() returns trigger as $$
begin
	update course set numStudes=numStudes+1 where code=new.course;
	return new; -- return value is ignore for AFTER triggers
end;
$$ language plpgsql;

create or replace function del_stu() returns trigger as $$
begin
   	update course set numStudes=numStudes-1 where code=old.course;  
	return old; -- return value is ignore for AFTER triggers
end;
$$ language plpgsql;

create or replace function upd_stu() returns trigger as $$
begin
	update course set numStudes=numStudes+1 where code=new.course;
	update course set numStudes=numStudes-1 where code=old.course; 
	return new; -- return value is ignore for AFTER triggers
end;
$$ language plpgsql;

create or replace function chk_quo() returns trigger as $$
declare
	quota_filled boolean;
begin 
        select into quota_filled (numStudes >= quota)
        from Course where code = new.course;
        if (quota_filled) then
                raise exception 'Class % is full', new.course;
        end if;
	return new;
end;
$$ language plpgsql;

create trigger ins_stu after insert on enrolment
    for each row execute procedure ins_stu();

create trigger del_stu after delete on enrolment
    for each row execute procedure del_stu();

create trigger upd_stu after update on enrolment
    for each row execute procedure upd_stu();

create trigger chk_quo before insert or update on enrolment
    for each row execute procedure chk_quo();

Consider the following (partial) relational schema:

   Shipments(id:integer, customer:integer, isbn:text, ship_date:timestamp)
   Editions(isbn:text, title:text, publisher:integer, published:date, ...)
   Stock(isbn:text, numInStock:integer, numSold:integer)
   Customer(id:integer, name:text, ...)

Define a PLpgSQL trigger function new_shipment() which is called after each INSERT or UPDATE operation is performed on the Shipments table.

The new_shipment() function should check to make sure that each added shipment contains a valid customer ID number and a valid ISBN. It should then update the stock information:

for an INSERT, subtract one from the total amount of stock and add one to the number sold
for an UPDATE, if the change involves the book, then restore the Stock entry for the old book and update the Stock entry for the new book

It should also calculate a new shipment ID (one higher than the previous highest) and ensure that it is placed in the shipment_id field of the new tuple. It should also set the time-stamp for the new tuple to the current time.

Under this scheme, tuples would be inserted into the Shipments table as:

insert into Shipments(customer,isbn) values (9300035,'0-8053-1755-4');

Answer:

create function new_shipment() returns trigger as $$
declare
    cust_id     integer;    -- customer ID
    book_isbn   text;       -- isbn of new book
    shipment_id integer;    -- shipment ID number
    right_now   timestamp;  -- current time
begin
    -- If there is no matching customer id number, raise an exception.
    select into cust_id id from customers where id = new.customer;
    if not found then
      raise exception 'Invalid customer ID number.';
    end if;
    
    -- If there is no matching ISBN, raise an exception.
    select into book_isbn isbn from editions where isbn = new.isbn;
    if not found then
      raise exception 'Invalid ISBN.';
    end if;
    
    -- If the previous checks succeeded, update the stock amount
    -- for INSERT commands.
    if TG_OP = 'INSERT' then
       update Stock set numInStock=numInStock-1 where isbn = new.isbn;
    elsif new.isbn <> old.isbn THEN
       update Stock set numInStock=numInStock+1 where isbn = OLD.isbn;
       update Stock set numInStock=numInStock-1 where isbn = new.isbn;
    end if;
     
     -- Set the current time variable to current time.
    right_now := now();
     
     -- Generate a new shipment_id
    select into shipment_id max(id) from shipments order by id desc;
    shipment_id := shipment_id + 1;

    -- Set the values in the new tuple
    new.id := shipment_id;
    new.ship_date := right_now;
    
    return new;
end;
$$ language plpgsql;

create trigger check_shipment before insert or update
on Shipments for each row execute procedure shipment_addition();

Suggest a PostgreSQL CREATE TABLE definition that would ensure that all of the effects in the previous question happened automatically.

Answer:

The above triggers implement referential integrity checks and introduce default values. The same effect could be achieved via:

create table Shipments (
    id        serial    primary key,
    -- or id  integer   default netxval('Shipments_id_seq') primary key
    customer  integer   references Customer(id),
    isbn      text      references Editions(id),
    ship_date timestamp default now()

Consider the following typical components of a PostgreSQL user-defined aggregate definition:
```
CREATE AGGREGATE AggName(BaseType) (
    stype     = ...,
    initcond  = ...,
    sfunc     = ...,
    finalfunc = ...,
);
```
Explain what each one does, and how they work together to produce the aggregation result.
Answer:

An aggregate maps a column of values of BaseType to a single value of type ResultType. The ResultType is often the same as the BaseType, but this is not required.
Aggregates work using a state data structure which carries through the entire calculation. The state could be a single atomic value, or could be a tuple. The stype parameter gives the precise data type of the state.

The initcond component gives the initial value of the state variable as a string. The sfunc function takes a state and a value, and returns a new state, based on incorporating the value into the state. The finalfunc function takes a state and returns the final return value of the aggregate. Note that finalfunc is optional; if it is not supplied the final result is just the final state.

Type signatures for the above:
```
aggregate Agg : setof BaseType → ResultType
sfunc : StateType,BaseType → StateType
finalfunc : StateType → ResultType
```
How they work together:
```
S : StateType
S = initcond
for each value V in column A of relation R {
   S = sfunc(S, V)
}
return finalfunc(S)
```

Imagine that PostgreSQL did not have an avg() aggregate to compute the mean of a column of numeric values. How could you implement it using a PostgreSQL user-defined aggregation definition (called mean)? Assume that it ignores null values. If the column is empty (has no values) return null.

Answer:

create type StateType as ( sum numeric, count numeric );

create function include(s StateType, v numeric) returns StateType
as $$
begin
   if (v is not NULL) then
      s.sum := s.sum + v;
      s.count := s.count + 1;
   end if;
   return s;
end;
$$ language plpgsql;

create or replace function compute(s StateType) returns numeric
as $$
begin
   if (s.count = 0) then
      return null;
   else
      return s.sum::numeric / s.count;
   end if;
end;
$$ language plpgsql;

create aggregate mean(numeric) (
    stype     = StateType,
    initcond  = '(0,0)',
    sfunc     = include,
    finalfunc = compute
);

How could you get the mean of a column of values without having to write your own aggregation operation? Assume a simple table with schema: R(...,a:integer,...).
Answer:

Using existing aggregates:
```
select sum(a)::numeric/count(a) from R;
```
You need to cast to numeric in case the column contains integer values; if you don't it will do integer division with truncation. Note that sum(a) and count(a) correctly ignore null values in the column. If you used count(*), you would count all tuples in the table and produce an average lower than it should be; you would effectively be treating the null values as 0.