❖ Things To Note |
❖ Assignment 2 |
Files involved:
Can add helper functions private to ADTs.
Do NOT change ADT interfaces (i.e. the *.h
For testing, we use our versions of the commands
*.h
❖ Assignment 2 (cont) |
How do I know it's correct?
Task 1: print hash for each attr, manually determine MA hash
Task 2: grep
Task 3: use stats
Debugging?
❖ EXPLAIN Examples |
Database
people(id, family, given, title, name, ..., birthday) courses(id, subject, semester, homepage) course_enrolments(student, course, mark, grade, ...) subjects(id, code, name, longname, uoc, offeredby, ...) ...
where
table_name | n_records ---------------------------+----------- people | 55767 courses | 73220 course_enrolments | 525688 subjects | 18525 ...
❖ EXPLAIN Examples (cont) |
Example: Select on non-indexed attribute
uni=# explain uni=# select * from Students where stype='local'; QUERY PLAN ---------------------------------------------------- Seq Scan on students (cost=0.00..562.01 rows=23543 width=9) Filter: ((stype)::text = 'local'::text)
where
Seq Scan
cost=
..
rows=
width=
❖ EXPLAIN Examples (cont) |
Example: Select on non-indexed attribute
uni=# explain analyze uni=# select * from Students where stype='local'; QUERY PLAN ---------------------------------------------------------- Seq Scan on students (cost=0.00..562.01 rows=23543 width=9) (actual time=0.011..4.704 rows=23551 loops=1) Filter: ((stype)::text = 'local'::text) Rows Removed by Filter: 7810 Planning time: 0.054 ms Execution time: 5.875 ms
❖ EXPLAIN Examples (cont) |
Example: Select on indexed, unique attribute
uni=# explain analyze uni-# select * from Students where id=1216988; QUERY PLAN ------------------------------------------------------- Index Scan using students_pkey on students (cost=0.29..8.30 rows=1 width=9) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (id = 1216988) Planning time: 0.066 ms Execution time: 0.062 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a primary key (indexed) attribute
uni=# explain analyze uni-# select s.id,p.name uni-# from Students s, People p where s.id=p.id; QUERY PLAN ---------------------------------------------------------- Merge Join (cost=0.58..2829.25 rows=31361 width=18) (actual time=0.044..25.883 rows=31361 loops=1) Merge Cond: (s.id = p.id) -> Index Only Scan using students_pkey on students s (cost=0.29..995.70 rows=31361 width=4) (actual time=0.033..6.195 rows=31361 loops=1) Heap Fetches: 31361 -> Index Scan using people_pkey on people p (cost=0.29..2434.49 rows=55767 width=18) (actual time=0.006..6.662 rows=31361 loops=1) Planning time: 0.259 ms Execution time: 27.327 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a non-indexed attribute
uni=# explain analyze uni=# select s1.code, s2.code uni-# from Subjects s1, Subjects s2 uni-# where s1.offeredBy = s2.offeredBy and s1.code < s2.code; QUERY PLAN --------------------------------------------------------------- Hash Join (cost=1286.03..126135.12 rows=2371100 width=18) (actual time=7.356..6806.042 rows=3655437 loops=1) Hash Cond: (s1.offeredby = s2.offeredby) Join Filter: (s1.code < s2.code) Rows Removed by Join Filter: 3673157 -> Seq Scan on subjects s1 (cost=0.00..1063.79 rows=17779 width=13) (actual time=0.009..4.602 rows=17779 loops=1) -> Hash (cost=1063.79..1063.79 rows=17779 width=13) (actual time=7.301..7.301 rows=17720 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1087kB -> Seq Scan on subjects s2 (cost=0.00..1063.79 rows=17779 width=13) (actual time=0.005..4.452 rows=17779 loops=1) Planning time: 0.159 ms Execution time: 6949.167 ms
❖ Exercise: EXPLAIN examples |
Using the database described earlier ...
Course_enrolments(student, course, mark, grade, ...) Courses(id, subject, semester, homepage) People(id, family, given, title, name, ..., birthday) Program_enrolments(id, student, semester, program, wam, ...) Students(id, stype) Subjects(id, code, name, longname, uoc, offeredby, ...) create view EnrolmentCounts as select s.code, t.year, t.term, count(e.student) as nstudes from Courses c join Subjects s on c.subject=s.id join Course_enrolments e on e.course = c.id join Semesters t on c.semester = t.id group by s.code, t.year, t.term
predict how each of the following queries will be executed ...
❖ Exercise: EXPLAIN examples (cont) |
Check your prediction using the EXPLAIN ANALYZE
select max(birthday) from People
select max(id) from People
select family from People order by family
select distinct p.id, p.name
from People p, Course_enrolments e
where p.id=e.student and e.grade='FL'
select * from EnrolmentCounts where code='COMP9315'
ORDER BY
DISTINCT
Add indexes to improve the speed of slow queries.
❖ Using EXPLAIN |
For more information on reading plans from EXPLAIN
EXPLAIN
FORMAT { TEXT | XML | JSON | YAML }
EXPLAIN
❖ Transaction Processing |
A transaction (tx) is ...
❖ Transaction Processing (cont) |
Transaction states:
COMMIT
ABORT
❖ Transaction Processing (cont) |
Concurrent transactions are
❖ Transaction Processing (cont) |
Transaction processing:
Our discussion focusses on:
Atomicity, Durability, Isolation
❖ Transaction Processing (cont) |
Atomicity is handled by the commit and abort mechanisms
Durability is handled by implementing stable storage, via
❖ Transaction Terminology |
To describe transaction effects, we consider:
READ
WRITE
ABORT
COMMIT
SELECT
READ
UPDATE
DELETE
READ
WRITE
INSERT
WRITE
❖ Transaction Terminology (cont) |
More on transactions and SQL
BEGIN
begin
COMMIT
END
end
ROLLBACK
ABORT
In PostgreSQL, tx's cannot be defined inside functions (e.g. PLpgSQL)
❖ Transaction Terminology (cont) |
The READ
WRITE
ABORT
COMMIT
read item X in transaction T | ||
write item X in transaction T | ||
abort transaction T | ||
commit transaction T |
❖ Schedules |
A schedule gives the sequence of operations from ≥ 1 tx
Serial schedule for a set of tx's T1 .. Tn
Concurrent schedule for a set of tx's T1 .. Tn
❖ Schedules (cont) |
Serial schedules guarantee database consistency
❖ Transaction Anomalies |
What problems can occur with (uncontrolled) concurrent tx's?
The set of phenomena can be characterised broadly under:
❖ Exercise: Update Anomaly |
Consider the following transaction (expressed in pseudo-code):
-- Accounts(id,owner,balance,...) transfer(src id, dest id, amount int) { -- R(X) select balance from Accounts where id = src; if (balance >= amount) { -- R(X),W(X) update Accounts set balance = balance-amount where id = src; -- R(Y),W(Y) update Accounts set balance = balance+amount where id = dest; } }
If two transfers occur on this account simultaneously,
give a schedule that illustrates the "dirty read" phenomenon.
❖ Exercise: How many Schedules? |
In the previous exercise, we looked at several schedules
For a given set of tx's T1 ... Tn ...
❖ Schedule Properties |
If a concurrent schedule on a set of tx's TT ...
Other properties of schedules focus on recovering from failures
❖ Transaction Failure |
So far, have implicitly assumed that all transactions commit.
Additional problems can arise when transactions abort.
Consider the following schedule where transaction T1 fails:
T1: R(X) W(X) A T2: R(X) W(X) C
Abort will rollback the changes to X
Consider three places where the rollback might occur:
T1: R(X) W(X) A [1] [2] [3] T2: R(X) W(X) C
❖ Transaction Failure (cont) |
Abort / rollback scenarios:
T1: R(X) W(X) A [1] [2] [3] T2: R(X) W(X) C
Case [1] is ok
❖ Recoverability |
Consider the serializable schedule:
T1: R(X) W(Y) C T2: W(X) A
(where the final value of Y
X
Notes:
❖ Recoverability (cont) |
Recoverable schedules avoid these kinds of problems.
For a schedule to be recoverable, we require additional constraints
Note that recoverability does not prevent "dirty reads".
In order to make schedules recoverable in the presence of dirty reads and aborts, may need to abort multiple transactions.
❖ Exercise: Recoverability/Serializability |
Recoverability and Serializability are orthogonal, i.e.
T1: W(A) W(B) C T2: W(A) R(B) C
Give examples of schedules on T1 and T2 that are
❖ Cascading Aborts |
Recall the earlier non-recoverable schedule:
T1: R(X) W(Y) C T2: W(X) A
To make it recoverable requires:
T1: R(X) W(Y) ... C? A! T2: W(X) A
Known as cascading aborts (or cascading rollback).
❖ Cascading Aborts (cont) |
Example: T3 aborts, causing T2 to abort, causing T1 to abort
T1: R(Y) W(Z) A T2: R(X) W(Y) A T3: W(X) A
Even though T1 has no direct connection with T3
(i.e. no shared data).
This kind of problem ...
❖ Cascading Aborts (cont) |
Cascading aborts can be avoided if
(alternative formulation: no tx can read data items written by an uncommitted tx)
Downside: reduces opportunity for concurrency.
These are called ACR (avoid cascading rollback) schedules.
All ACR schedules are also recoverable.
❖ Strictness |
Strict schedules also eliminate the chance of writing dirty data.
A schedule is strict if
❖ Strictness (cont) |
Example: non-strict schedule
T1: W(X) A T2: W(X) A
Problems with handling rollback after aborts:
❖ Classes of Schedules |
Relationship between various classes of schedules:
Schedules ought to be serializable and strict.
But more serializable/strict ⇒ less concurrency.
DBMSs allow users to trade off "safety" against performance.
❖ Transaction Isolation |
Simplest form of isolation: serial execution (T1 ; T2 ; T3 ; ...)
Problem: serial execution yields poor throughput.
Concurrency control schemes (CCSs) aim for "safe" concurrency
Abstract view of DBMS concurrency mechanisms:
❖ Serializability |
Consider two schedules S1 and S2 produced by
❖ Serializability (cont) |
Two formulations of serializability:
❖ Exercise: Serializability Checking |
Is the following schedule view/conflict serializable?
T1: W(B) W(A) T2: R(B) W(A) T3: R(A) W(A)
Is the following schedule view/conflict serializable?
T1: W(B) W(A) T2: R(B) W(A) T3: R(A) W(A)
❖ Transaction Isolation Levels |
SQL programmers' concurrency control mechanism ...
set transaction read only -- so weaker isolation may be ok read write -- suggests stronger isolation needed isolation level -- weakest isolation, maximum concurrency read uncommitted read committed repeatable read serializable -- strongest isolation, minimum concurrency
Applies to current tx only; affects how scheduler treats this tx.
❖ Transaction Isolation Levels (cont) |
Implication of transaction isolation levels:
Isolation Level |
Dirty Read |
Nonrepeatable Read |
Phantom Read |
Read Uncommitted |
Possible | Possible | Possible |
Read Committed |
Not Possible | Possible | Possible |
Repeatable Read |
Not Possible | Not Possible | Possible |
Serializable | Not Possible | Not Possible | Not Possible |
❖ Transaction Isolation Levels (cont) |
For transaction isolation, PostgreSQL
For more details, see PostgreSQL Documentation section 13.2
UPDATE
INSERT
DELETE
❖ Transaction Isolation Levels (cont) |
A PostgreSQL tx consists of a sequence of SQL statements:
BEGIN S1; S2; ... Sn; COMMIT;
Isolation levels affect view of DB provided to each Si:
❖ Concurrency Control |
Approaches to concurrency control:
❖ Lock-based Concurrency Control |
Locks introduce additional mechanisms in DBMS:
The Lock Manager
❖ Lock-based Concurrency Control (cont) |
Lock table entries contain:
Lock upgrade:
❖ Lock-based Concurrency Control (cont) |
Synchronise access to shared data items via following rules:
❖ Lock-based Concurrency Control (cont) |
Consider the following schedule, using locks:
T1(a): Lr(Y) R(Y) continued T2(a): Lr(X) R(X) U(X) continued T1(b): U(Y) Lw(X) W(X) U(X) T2(b): Lw(Y)....W(Y) U(Y)
(where Lr = read-lock, Lw = write-lock, U = unlock)
Locks correctly ensure controlled access to X
Y
Despite this, the schedule is not serializable. (Ex: prove this)
❖ Two-Phase Locking |
To guarantee serializability, we require an additional constraint:
❖ Problems with Locking |
Appropriate locking can guarantee correctness.
However, it also introduces potential undesirable effects:
❖ Deadlock |
Deadlock occurs when two transactions are waiting for a lock on an item held by the other.
Example:
T1: Lw(A) R(A) Lw(B) ...... T2: Lw(B) R(B) Lw(A) .....
How to deal with deadlock?
Produced: 9 Apr 2024