❖ Week 09 Wednesday |
❖ PostgreSQL Query Costs |
PostgreSQL provides the explain
EXPLAIN [ANALYZE] Query
Without ANALYZE
EXPLAIN
With ANALYZE
EXPLAIN
Note that runtimes may show considerable variation due to buffering.
If simply want to know the runtime is ok, maybe \timing
❖ EXPLAIN Examples |
Note that PostgreSQL builds a query evaluation tree, rather than a linear plan, e.g.
EXPLAIN
❖ EXPLAIN Examples (cont) |
Example: Select on indexed attribute
db=# explain select * from Students where id=123475; QUERY PLAN ------------------------------------------------------------ Index Scan using students_pkey on students (cost=0.28..8.30 rows=1 width=8) Index Cond: (id = 123475) db=# explain analyze select * from Students where id=123475; QUERY PLAN ------------------------------------------------------------ Total runtime: 3.252 ms Index Scan using students_pkey on students (cost=0.28..8.30 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (id = 123475) Planning Time: 0.058 ms Execution Time: 0.025 ms
❖ EXPLAIN Examples (cont) |
Example: Select on non-indexed attribute
db=# explain select * from people where origin=13; QUERY PLAN ------------------------------------------------------------ Seq Scan on people (cost=0.00..178.35 rows=1323 width=37) Filter: (origin = 13) db=# explain analyze select * from people where origin=13; QUERY PLAN ---------------------------------------------------------- Seq Scan on people (cost=0.00..178.35 rows=1323 width=37) (actual time=0.173..1.214 rows=1323 loops=1) Filter: (origin = 13) Rows Removed by Filter: 7185 Planning Time: 0.058 ms Execution Time: 1.299 ms
❖ EXPLAIN Examples (cont) |
Example: aggregate on an indexed attribute
db=# explain analyze select max(id) from People; QUERY PLAN -------------------------------------------------------------------- Result (cost=0.31..0.32 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.29..0.31 rows=1 width=4) (actual time=0.017..0.017 rows=1 loops=1) -> Index Only Scan Backward using people_pkey on people (cost=0.29..253. 17 rows=8508 width=4) (actual time=0.016..0.016 rows=1 loops=1) Index Cond: (id IS NOT NULL) Heap Fetches: 0 Planning Time: 0.090 ms Execution Time: 0.039 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a primary key (indexed) attribute
db=# explain analyze select s.id, p.zid, p.full_name db=# from Students s join People p on s.id=p.id; QUERY PLAN --------------------------------------------------------------- Hash Join (cost=163.91..343.34 rows=6085 width=20) (actual time=1.959..5.190 rows=6 085 loops=1) Hash Cond: (p.id = s.id) -> Seq Scan on people p (cost=0.00..157.08 rows=8508 width=20) (actual time=0.00 4..0.910 rows=8508 loops=1) -> Hash (cost=87.85..87.85 rows=6085 width=4) (actual time=1.942..1.943 rows=608 5 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 278kB -> Seq Scan on students s (cost=0.00..87.85 rows=6085 width=4) (actual time=0.004..0.864 rows=6085 loops=1) Planning Time: 0.178 ms Execution Time: 5.569 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a non-indexed attribute
db=# explain analyze db=# select s1.code, s2.code db=# from Subjects s1 join Subjects s2 on s1.owner=s2.owner; QUERY PLAN ---------------------------------------------------------------- Hash Join (cost=152.58..10257.41 rows=785511 width=18) (actual time=1.851..125.883 rows=785511 loops=1) Hash Cond: (s1.owner = s2.owner) -> Seq Scan on subjects s1 (cost=0.00..95.59 rows=4559 width=13) (actual time=0. 006..0.563 rows=4559 loops=1) -> Hash (cost=95.59..95.59 rows=4559 width=13) (actual time=1.826..1.827 rows=45 59 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 278kB -> Seq Scan on subjects s2 (cost=0.00..95.59 rows=4559 width=13) (actual t ime=0.004..0.851 rows=4559 loops=1) Planning Time: 0.112 ms Execution Time: 168.613 ms
❖ Exercise: RA Tree to RA Plan |
Re-write the above tree-based plans as sequences of RA ops
Example:
Index Scan using students_pkey on students (cost=0.28..8.30 rows=1 width=8) Index Cond: (id = 123475)
is simply
Res = Sel[id=123475]Students
❖ Transactions, Concurrency |
DBMSs provide valuable information resources in an environment that is:
❖ Transactions, Concurrency (cont) |
Transaction processing
❖ Transactions |
A transaction is
To maintain integrity of data, transactions must be:
❖ Example Transaction |
Bank funds transfer
Accounts(id,name,balance,heldAt, ...)
Branches(id,name,address,assets, ...)
Branches.assets
❖ Example Transaction (cont) |
Example function to implement bank transfer ...
create or replace function transfer(N integer, Src text, Dest text) returns integer declare sID integer; dID integer; avail integer; begin select id,balance into sID,avail from Accounts where name=Src; if (sID is null) then raise exception 'Invalid source account %',Src; end if; select id into dID from Accounts where name=Dest; if (dID is null) then raise exception 'Invalid dest account %',Dest; end if; ...
❖ Example Transaction (cont) |
Example function to implement bank transfer
... if (avail < N) then raise exception 'Insufficient funds in %',Src; end if; -- total funds in system = NNNN update Accounts set balance = balance-N where id = sID; -- funds temporarily "lost" from system update Accounts set balance = balance+N where id = dID; -- funds restored to system; total funds = NNNN return nextval('tx_id_seq'); end;
❖ Exercise: Transaction Failures |
Describe any problems from the transaction failing after:
❖ Transaction Concepts |
A transaction must always terminate, either:
COMMIT
ABORT
❖ Transaction Concepts (cont) |
To describe transaction effects, we consider:
READ
WRITE
ABORT
COMMIT
R(X)
W(X)
A
C
SELECT
READ
INSERT
WRITE
UPDATE
DELETE
READ
WRITE
❖ Transaction Consistency |
Transactions typically have intermediate states that are invalid.
However, states before and after transaction must be valid.
Valid = consistent = satisfying all specified constraints on the data
❖ Transaction Consistency (cont) |
Transaction descriptions can be abstracted
❖ Serial Schedules |
Serial execution:
T1
T2
T2
T1
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
Serial execution guarantees a consistent final state if
T1
T2
❖ Concurrent Schedules |
Concurrent schedules interleave T1
T2
Some concurrent schedules are ok, e.g.
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
Other concurrent schedules cause anomalies, e.g.
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
Want the system to ensure that only valid schedules occur.
❖ Serializability |
Serializable schedule:
Abstracting this needs a notion of schedule equivalence.
Two common formulations of serializability:
❖ Conflict Serializability |
A characterization of serializability based on conflicting operations
Consider two transactions T1 and T2 acting on data item X.
Possible orders for read/write operations by T1 and T2:
T2 first | Equiv? | |||||||
R2(X) R1(X) | yes | |||||||
W2(X) R1(X) | no | |||||||
R2(X) W1(X) | no | |||||||
W2(X) W1(X) | no |
Note: if T1 and T2 act on different data items (e.g. R(X), W(Y)), order does not matter.
❖ Conflict Serializability (cont) |
Two transactions have a potential conflict if
If no conflict, can swap order without affecting the result.
If we can transform a schedule
❖ Conflict Serializability (cont) |
Example: transform a concurrent schedule to serial schedule
T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B)
❖ Conflict Serializability (cont) |
Checking for conflict-serializability:
❖ Example Precedence Graph |
Example schedule which is not conflict serializable:
T1: R(X) R(Y) W(X) W(Y) T2: R(X) W(X) T3: R(X) W(X)
Precendence graph for the above schedule:
❖ Exercise: Precedence Graphs |
Build precedence graphs for the following schedules:
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
❖ View Serializability |
View Serializability is
❖ View Serializability (cont) |
Two schedules S and S' on T1 .. Tn are view equivalent iff
❖ Exercise: Checking Serializability |
Check whether each schedule is conflict/view serializable:
T1: R(X) W(X) T2: R(X) W(X)
T1: W(X) R(Y) T2: R(Y) R(X)
T1: R(X) W(X) T2: W(X) T3: W(X)
❖ Concurrency Control |
Serializability tests are useful theoretically ...
But don't provide a mechanism for organising schedules
❖ Concurrency Control (cont) |
Approaches to ensuring ACID transactions:
Synchronise transaction execution via locks on some portion of the database.
Allow multiple consistent versions of the data to exist, and allow each transaction exclusive access to one version.
Organise transaction execution in advance by assigning timestamps to operations.
Exploit typical execution-sequence properties of transactions to determine safety dynamically.
❖ Lock-based Concurrency Control |
Synchronise access to shared data items via following rules:
❖ Lock-based Concurrency Control (cont) |
Examples of locking:
Schedule 1 T1: Lx(X) R(X) W(X) U(X) T2: Lx(Y) R(Y) W(Y) U(Y) Schedule 2 T1: Lx(X) R(X) W(X) U(X) T2: Lx(X) .............. R(X) W(X) U(X)
New operations: Lx()
Ls()
U()
Note: in Schedule 2, locking effectively forces serial execution
This is not generally the case; there may be some concurrency
In general, locking reduces concurrency, but gains safety
Produced: 10 Nov 2023