COMP9315 Week 09 Monday Lecture

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [0/57]
❖ Things To Note

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [1/57]
❖ Assignment 2

Files involved:

You need to modify (at least):   query.c,   reln.c,   tuple.c

Can add helper functions private to ADTs.

Do NOT change ADT interfaces (i.e. the *.h files)

For testing, we use our versions of the commands

Submit all of the files that you changed.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [2/57]
❖ Assignment 2 (cont)

How do I know it's correct?

Task 1: print hash for each attr, manually determine MA hash

Task 2: grep in data file, run select, compare results

Task 3: use stats to check that the file has expanded appropriately

Debugging?

Try to avoid memory leaks
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [3/57]
❖ EXPLAIN Examples

Database


people(id, family, given, title, name, ..., birthday)
courses(id, subject, semester, homepage) 
course_enrolments(student, course, mark, grade, ...) 
subjects(id, code, name, longname, uoc, offeredby, ...)
...

where


       table_name          | n_records 
---------------------------+-----------
 people                    |     55767
 courses                   |     73220
 course_enrolments         |    525688
 subjects                  |     18525
...

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [4/57]
❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute


uni=# explain
uni=# select * from Students where stype='local';
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
   Filter: ((stype)::text = 'local'::text)

where

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [5/57]
❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute


uni=# explain analyze
uni=# select * from Students where stype='local';
                       QUERY PLAN
----------------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
             (actual time=0.011..4.704 rows=23551 loops=1)
   Filter: ((stype)::text = 'local'::text)
   Rows Removed by Filter: 7810
 Planning time: 0.054 ms
 Execution time: 5.875 ms

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [6/57]
❖ EXPLAIN Examples (cont)

Example: Select on indexed, unique attribute


uni=# explain analyze
uni-# select * from Students where id=1216988;
                       QUERY PLAN
-------------------------------------------------------
 Index Scan using students_pkey on students
                  (cost=0.29..8.30 rows=1 width=9)
                  (actual time=0.011..0.012 rows=1 loops=1)
   Index Cond: (id = 1216988)
 Planning time: 0.066 ms
 Execution time: 0.062 ms

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [7/57]
❖ EXPLAIN Examples (cont)

Example: Join on a primary key (indexed) attribute


uni=# explain analyze
uni-# select s.id,p.name
uni-# from Students s, People p where s.id=p.id;
                      QUERY PLAN
----------------------------------------------------------
Merge Join  (cost=0.58..2829.25 rows=31361 width=18)
            (actual time=0.044..25.883 rows=31361 loops=1)
  Merge Cond: (s.id = p.id)
  ->  Index Only Scan using students_pkey on students s
            (cost=0.29..995.70 rows=31361 width=4)
            (actual time=0.033..6.195 rows=31361 loops=1)
        Heap Fetches: 31361
  ->  Index Scan using people_pkey on people p
            (cost=0.29..2434.49 rows=55767 width=18)
            (actual time=0.006..6.662 rows=31361 loops=1)
Planning time: 0.259 ms
Execution time: 27.327 ms

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [8/57]
❖ EXPLAIN Examples (cont)

Example: Join on a non-indexed attribute


uni=# explain analyze
uni=# select s1.code, s2.code
uni-# from Subjects s1, Subjects s2
uni-# where s1.offeredBy = s2.offeredBy and s1.code < s2.code;
                        QUERY PLAN
---------------------------------------------------------------
Hash Join  (cost=1286.03..126135.12 rows=2371100 width=18)
           (actual time=7.356..6806.042 rows=3655437 loops=1)
  Hash Cond: (s1.offeredby = s2.offeredby)
  Join Filter: (s1.code < s2.code)
  Rows Removed by Join Filter: 3673157
  ->  Seq Scan on subjects s1 
          (cost=0.00..1063.79 rows=17779 width=13)
          (actual time=0.009..4.602 rows=17779 loops=1)
  ->  Hash  (cost=1063.79..1063.79 rows=17779 width=13)
            (actual time=7.301..7.301 rows=17720 loops=1)
        Buckets: 32768  Batches: 1  Memory Usage: 1087kB
        ->  Seq Scan on subjects s2
                (cost=0.00..1063.79 rows=17779 width=13)
                (actual time=0.005..4.452 rows=17779 loops=1)
Planning time: 0.159 ms
Execution time: 6949.167 ms

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [9/57]
❖ Exercise: EXPLAIN examples

Using the database described earlier ...

Course_enrolments(student, course, mark, grade, ...)
Courses(id, subject, semester, homepage)
People(id, family, given, title, name, ..., birthday)
Program_enrolments(id, student, semester, program, wam, ...)
Students(id, stype)
Subjects(id, code, name, longname, uoc, offeredby, ...)

create view EnrolmentCounts as
 select s.code, t.year, t.term, count(e.student) as nstudes
   from Courses c join Subjects s on c.subject=s.id
        join Course_enrolments e on e.course = c.id
        join Semesters t on c.semester = t.id
  group by s.code, t.year, t.term

predict how each of the following queries will be executed ...

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [10/57]
❖ Exercise: EXPLAIN examples (cont)

Check your prediction using the EXPLAIN ANALYZE command.

  1. select max(birthday) from People
  2. select max(id) from People
  3. select family from People order by family
  4. select distinct p.id, p.name
    from People p, Course_enrolments e
    where p.id=e.student and e.grade='FL'
  5. select * from EnrolmentCounts where code='COMP9315'
Examine the effect of adding ORDER BY and DISTINCT.

Add indexes to improve the speed of slow queries.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [11/57]
❖ Using EXPLAIN


For more information on reading plans from EXPLAIN

Can get EXPLAIN output in different formats: General PostgreSQL performance tuning
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [12/57]
❖ Transaction Processing

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [13/57]
❖ Transaction Processing

A transaction (tx) is ...


A transaction effects a state change on the DB

[Diagram:Pics/txproc/tx-exec1.png]

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [14/57]
❖ Transaction Processing (cont)

Transaction states:

[Diagram:Pics/txproc/tx-states1.png]

COMMIT all changes preserved,   ABORT database unchanged

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [15/57]
❖ Transaction Processing (cont)

Concurrent transactions are

To ensure problem-free concurrent transactions:
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [16/57]
❖ Transaction Processing (cont)

Transaction processing:


Consistency is the property: Ensuring this must be left to application programmers.


Our discussion focusses on: Atomicity, Durability, Isolation

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [17/57]
❖ Transaction Processing (cont)

Atomicity is handled by the commit and abort mechanisms

Durability is handled by implementing stable storage, via

Isolation is handled by concurrency control mechanisms
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [18/57]
❖ Transaction Processing (cont)

Where transaction processing fits in the DBMS:

[Diagram:Pics/txproc/dbmsarch.png]

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [19/57]
❖ Transaction Terminology

To describe transaction effects, we consider:

Relationship between the above operations and SQL:
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [20/57]
❖ Transaction Terminology (cont)

More on transactions and SQL


In PostgreSQL, tx's cannot be defined inside functions (e.g. PLpgSQL)

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [21/57]
❖ Transaction Terminology (cont)

The READ, WRITE, ABORT, COMMIT operations:

The operations are typically denoted as:

RT(X) read item X in transaction T
WT(X) write item X in transaction T
AT abort transaction T
CT commit transaction T

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [22/57]
❖ Schedules

A schedule gives the sequence of operations from ≥ 1 tx

Serial schedule for a set of tx's T1 .. Tn

E.g.   RT1(A)   WT1(A)   RT2(B)   RT2(A)   WT3(C)   WT3(B)  

Concurrent schedule for a set of tx's T1 .. Tn

E.g.   RT1(A)   RT2(B)   WT1(A)   WT3(C)   WT3(B)   RT2(A)  
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [23/57]
❖ Schedules (cont)

Serial schedules guarantee database consistency

Concurrent schedules interleave tx operations arbitrarily
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [24/57]
❖ Transaction Anomalies

What problems can occur with (uncontrolled) concurrent tx's?

The set of phenomena can be characterised broadly under:

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [25/57]
❖ Exercise: Update Anomaly

Consider the following transaction (expressed in pseudo-code):

-- Accounts(id,owner,balance,...)
transfer(src id, dest id, amount int)
{
   -- R(X)
   select balance from Accounts where id = src;
   if (balance >= amount) {
      -- R(X),W(X)
      update Accounts set balance = balance-amount
      where id = src;
      -- R(Y),W(Y)
      update Accounts set balance = balance+amount
      where id = dest;
}  }

If two transfers occur on this account simultaneously,
give a schedule that illustrates the "dirty read" phenomenon.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [26/57]
❖ Exercise: How many Schedules?

In the previous exercise, we looked at several schedules

For a given set of tx's T1 ... Tn ...

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [27/57]
❖ Schedule Properties

If a concurrent schedule on a set of tx's TT ...

Primary goal of isolation mechanisms (see later) is Serializability is one property of a schedule, focusing on isolation

Other properties of schedules focus on recovering from failures

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [28/57]
❖ Transaction Failure

So far, have implicitly assumed that all transactions commit.

Additional problems can arise when transactions abort.

Consider the following schedule where transaction T1 fails:

T1: R(X) W(X) A
T2:             R(X) W(X) C

Abort will rollback the changes to X, but ...

Consider three places where the rollback might occur:

T1: R(X) W(X) A [1]     [2]        [3]
T2:                 R(X)    W(X) C

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [29/57]
❖ Transaction Failure (cont)

Abort / rollback scenarios:

T1: R(X) W(X) A [1]     [2]        [3]
T2:                 R(X)    W(X) C

Case [1] is ok

Case [2] is problematic Case [3] is also problematic
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [30/57]
❖ Recoverability

Consider the serializable schedule:

T1:        R(X)  W(Y)  C
T2:  W(X)                 A

(where the final value of Y is dependent on the X value)

Notes:

Produces an invalid database state, even though serializable.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [31/57]
❖ Recoverability (cont)

Recoverable schedules avoid these kinds of problems.

For a schedule to be recoverable, we require additional constraints

and this property must hold for all transactions Tj

Note that recoverability does not prevent "dirty reads".

In order to make schedules recoverable in the presence of dirty reads and aborts, may need to abort multiple transactions.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [32/57]
❖ Exercise: Recoverability/Serializability

Recoverability and Serializability are orthogonal, i.e.

Consider the two transactions:

T1:  W(A)  W(B)  C
T2:  W(A)  R(B)  C

Give examples of schedules on T1 and T2 that are

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [33/57]
❖ Cascading Aborts

Recall the earlier non-recoverable schedule:

T1:        R(X)  W(Y)  C
T2:  W(X)                 A

To make it recoverable requires:

T1:        R(X)  W(Y) ...   C? A!
T2:  W(X)                 A

Known as cascading aborts (or cascading rollback).

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [34/57]
❖ Cascading Aborts (cont)

Example: T3 aborts, causing T2 to abort, causing T1 to abort

T1:                    R(Y)  W(Z)        A
T2:        R(X)  W(Y)                 A
T3:  W(X)                          A

Even though T1 has no direct connection with T3
(i.e. no shared data).

This kind of problem ...

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [35/57]
❖ Cascading Aborts (cont)

Cascading aborts can be avoided if

Effectively: eliminate the possibility of reading dirty data.

Downside: reduces opportunity for concurrency.

These are called ACR (avoid cascading rollback) schedules.

All ACR schedules are also recoverable.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [36/57]
❖ Strictness

Strict schedules also eliminate the chance of writing  dirty data.

A schedule is strict if

Strict schedules simplify the task of rolling back after aborts.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [37/57]
❖ Strictness (cont)

Example: non-strict schedule

T1:  W(X)        A
T2:        W(X)     A

Problems with handling rollback after aborts:

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [38/57]
❖ Classes of Schedules

Relationship between various classes of schedules:

[Diagram:Pics/txproc/schedules.png]

Schedules ought to be serializable and strict.

But more serializable/strict less concurrency.

DBMSs allow users to trade off "safety" against performance.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [39/57]
❖ Transaction Isolation

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [40/57]
❖ Transaction Isolation

Simplest form of isolation: serial execution   (T1 ; T2 ; T3 ; ...)

Problem: serial execution yields poor throughput.

Concurrency control schemes (CCSs) aim for "safe" concurrency

Abstract view of DBMS concurrency mechanisms:

[Diagram:Pics/txproc/txproc1.png]

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [41/57]
❖ Serializability

Consider two schedules S1 and S2 produced by

S1  and S2  are equivalent if StateAfter(S1)  =  StateAfter(S2)
S  is a serializable schedule (for a set of concurrent tx's T1 ..Tn) if Under these circumstances, consistency is guaranteed
(assuming no aborted transactions and no system failures)
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [42/57]
❖ Serializability (cont)

Two formulations of serializability:

View serializability is strictly weaker than conflict serializability.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [43/57]
❖ Exercise: Serializability Checking

Is the following schedule view/conflict serializable?

T1:        W(B)  W(A)
T2:  R(B)                    W(A)
T3:                    R(A)        W(A)

Is the following schedule view/conflict serializable?

T1:        W(B)  W(A)
T2:  R(B)              W(A)
T3:                          R(A)  W(A)

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [44/57]
❖ Transaction Isolation Levels

SQL programmers' concurrency control mechanism ...

set transaction
    read only  -- so weaker isolation may be ok
    read write -- suggests stronger isolation needed
isolation level
    -- weakest isolation, maximum concurrency
    read uncommitted
    read committed
    repeatable read
    serializable
    -- strongest isolation, minimum concurrency

Applies to current tx only; affects how scheduler treats this tx.

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [45/57]
❖ Transaction Isolation Levels (cont)

Implication of transaction isolation levels:

Isolation
Level
Dirty
Read
Nonrepeatable
Read
Phantom
Read
Read
Uncommitted
Possible Possible Possible
Read
Committed
Not Possible Possible Possible
Repeatable
Read
Not Possible Not Possible Possible
Serializable Not Possible Not Possible Not Possible
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [46/57]
❖ Transaction Isolation Levels (cont)

For transaction isolation, PostgreSQL


Note: cannot implement read uncommitted because of MVCC


For more details, see PostgreSQL Documentation section 13.2

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [47/57]
❖ Transaction Isolation Levels (cont)

A PostgreSQL tx consists of a sequence of SQL statements:

BEGIN S1; S2; ... Sn; COMMIT;

Isolation levels affect view of DB provided to each Si:

Transactions fail if the system detects violation of isolation level.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [48/57]
❖ Implementing Concurrency Control

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [49/57]
❖ Concurrency Control

Approaches to concurrency control:

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [50/57]
❖ Lock-based Concurrency Control

Locks introduce additional mechanisms in DBMS:

[Diagram:Pics/txproc/txproc2.png]

The Lock Manager

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [51/57]
❖ Lock-based Concurrency Control (cont)

Lock table entries contain:

Lock and unlock operations must be atomic.

Lock upgrade:

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [52/57]
❖ Lock-based Concurrency Control (cont)

Synchronise access to shared data items via following rules:

These rules alone do not guarantee serializability.
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [53/57]
❖ Lock-based Concurrency Control (cont)

Consider the following schedule, using locks:

T1(a): Lr(Y)     R(Y)           continued
T2(a):      Lr(X)    R(X) U(X)  continued

T1(b):      U(Y)         Lw(X) W(X) U(X)
T2(b): Lw(Y)....W(Y) U(Y)

(where Lr = read-lock, Lw = write-lock, U = unlock)

Locks correctly ensure controlled access to X and Y.

Despite this, the schedule is not serializable.  (Ex: prove this)

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [54/57]
❖ Two-Phase Locking

To guarantee serializability, we require an additional constraint:

Each transaction is then structured as:
Clearly, this reduces potential concurrency ...
COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [55/57]
❖ Problems with Locking

Appropriate locking can guarantee correctness.

However, it also introduces potential undesirable effects:

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [56/57]
❖ Deadlock

Deadlock occurs when two transactions are waiting for a lock on an item held by the other.

Example:

T1: Lw(A) R(A)            Lw(B) ......
T2:            Lw(B) R(B)       Lw(A) .....

How to deal with deadlock?

COMP9315 24T1 ♢ Week 9 Monday Lecture ♢ [57/57]


Produced: 9 Apr 2024