Transaction Processing
| Transaction Processing | 1/153 |
Where transaction processing fits in the DBMS:
| Transactions | 2/153 |
A transaction is:
| ... Transactions | 3/153 |
A transaction
A transaction can end in two possible ways:
| ... Transactions | 4/153 |
A transaction is a DB state-change operation.
Assume that the code of the transaction
| ... Transactions | 5/153 |
Transactions execute on a collection of data that is
| ... Transactions | 6/153 |
If a transaction commits, must ensure that
| Concurrency in DBMSs | 7/153 |
Concurrency in a multi-user DBMS like PostgreSQL:
| ACID Properties | 8/153 |
Data integrity is assured if transactions satisfy the following:
Atomicity
Consistency
Isolation
Durability
| ... ACID Properties | 9/153 |
Atomicity can be represented by state-transitions:
COMMITABORT
| ... ACID Properties | 10/153 |
Transaction processing:
Our discussion thus focusses on: Atomicity, Durability, Isolation
| ... ACID Properties | 11/153 |
Atomicity is handled by the commit and abort mechanisms
Durability is handled by implementing stable storage, via
| Review of Transaction Terminology | 12/153 |
To describe transaction effects, we consider:
READWRITEABORTCOMMITSELECTREADUPDATEDELETEREADWRITEINSERTWRITE
| ... Review of Transaction Terminology | 13/153 |
More on transactions and SQL
BEGINbeginCOMMITENDendROLLBACKABORT
| ... Review of Transaction Terminology | 14/153 |
The READWRITEABORTCOMMIT
| read item X in transaction T | ||
| write item X in transaction T | ||
| abort transaction T | ||
| commit transaction T |
| Schedules | 15/153 |
A schedule gives the sequence of operations that occur
For a single tx, there is a single schedule (its operations in order).
For a group of tx's, there are very many possible schedules.
| ... Schedules | 16/153 |
Consider a simple banking transaction, expressed in "PLpgSQL":
create function withdraw(Acct integer, Required float) returns text as $$ declare Bal float; begin select balance into Bal from Accounts where id=Acct;R Bal := Bal - Required; update Accounts set balance=Bal where id=Acct;W if (Bal < 0) then rollback; return 'Insufficient funds';A else commit; return 'New balance: '||Bal::text;C end if; end; $$ language plpgsql;
Notes:
begin
| ... Schedules | 17/153 |
If tx T = withdraw(,)
If tx T = withdraw(,)withdraw(,)
some possible schedules are:
| ... Schedules | 18/153 |
Serial schedules have no interleave of operations from different tx's.
Why serial schedules are good:
| ... Schedules | 19/153 |
With different-ordered serial executions, tx's may get different results.
I.e. StateAfter(T1;T2) = StateAfter(T2;T1) is not generally true.
Consider the following two transactions:
T1 : select sum(salary)
from Employee where dept='Sales'
T2 : insert into Employee
values (....,'Sales',...)
If we execute T1 then T2 we get a smaller salary total than if we execute T2 then T1.
In both cases, however, the salary total is consistent with the state of the database at the time the transaction is executed.
| ... Schedules | 20/153 |
A serial execution of consistent transactions is always consistent.
If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.
In the worst case, the effect of executing the transactions ...
| ... Schedules | 21/153 |
Not all concurrent executions cause problems.
For example, the schedules
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or ...
leave the database in a consistent state.
| Example Transaction #1 | 22/153 |
Problem: Allocate a seat on a plane flight
Implement as a function returning success status:
function allocSeat(paxID integer,
flightID integer,
seatNum string)
returning boolean
{
check whether seat currently occupied
if (already occupied)
tell pax seat taken; return !ok
else
assign pax to seat; return ok
}
Assume schema:
Flight(flightID, flightNum, flightTime, ...) SeatingAlloc(flightID, seatNum, paxID, ...)
| ... Example Transaction #1 | 23/153 |
PLpgSQL implementation for seat allocation:
create or replace function
allocSeat(paxID int, fltID int, seat text) returns boolean
as $$
declare
pid int;
begin
select paxID into pid from SeatingAlloc
where flightID = fltID and seatNum = seat;
if (pid is not null) then
return false; -- someone else already has seat
else
update SeatingAlloc set pax = paxID
where flightID = fltID and seatNum = seat;
commit;
return true;
end if;
end;
$$ langauge plpgsql;
| ... Example Transaction #1 | 24/153 |
If customer Cust1 executes allocSeat(Cust1,f,23B)
| ... Example Transaction #1 | 25/153 |
Consider two customers trying allocSeat(?,f,23B) simultaneously.
A possible order of operations ...
Serial execution (e.g. enforced by locks) could solve these kinds of problem.
| Example Transaction #2 | 26/153 |
Problem: transfer funds between two accounts in same bank.
Implement as a function returning success status:
function transfer(sourceAcct integer,
destAcct integer,
amount real)
returning boolean
{
check whether sourceAcct is valid
check whether destAcct is valid
check whether sourceAcct has
sufficient funds (>= amount)
if (all ok) {
withdraw money from sourceAcct
deposit money into destAcct
} }
| ... Example Transaction #2 | 27/153 |
PLpgSQL for funds transfer between accounts:
create or replace function
transfer(source int, dest int, amount float)
returns boolean
as $$
declare
ok boolean := true;
acct Accounts%rowtype;
begin
select * into acct
from Accounts where id=source;
if (not found) then
raise warning 'Invalid Withdrawal Account';
ok := false;
end if;
select * from Accounts where id=dest;
if (not found) then
raise warning 'Invalid Deposit Account';
ok := false;
end if;
if (acct.balance < amount) then
raise warning 'Insufficient funds';
ok := false;
end if;
if (not ok) then rollback; return false; end if;
update Accounts
set balance := balance - amount
where id = source;
update Accounts
set balance := balance + amount
where id = dest;
commit;
return true;
end;
$$ language plpgsql;
| ... Example Transaction #2 | 28/153 |
If customer transfers $1000 from Acct1 to Acct2
Similar to earlier problem; could be fixed by serialization.
| ... Example Transaction #2 | 29/153 |
Consider customer transfers $1000 from Acct1 to Acct2
| Transaction Anomalies | 30/153 |
What problems can occur with concurrent transactions?
The set of phenomena can be characterised broadly under:
| ... Transaction Anomalies | 31/153 |
Dirty read: a transaction reads data written by a concurrent uncommitted transaction
Example:
Transaction T1 Transaction T2
(1) select a into X
from R where id=1
(2) select a into Y
from R where id=1
(3) update R set a=X+1
where id=1
(4) commit
(5) update R set a=Y+1
where id=1
(6) commit
Effect: T1's update on R.a
| ... Transaction Anomalies | 32/153 |
Nonrepeatable read: a transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read)
Example:
Transaction T1 Transaction T2
(1) select * from R
where id=5
(2) update R set a=8
where id=5
(3) commit
(4) select * from R
where id=5
Effect: T1 runs same query twice; sees different data
| ... Transaction Anomalies | 33/153 |
Phantom read: a transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction
Example:
Transaction T1 Transaction T2
(1) select count(*)
from R where a=5
(2) insert into R(id,a,b)
values (2,5,8)
(3) commit
(4) select count(*)
from R where a=5
Effect: T1 runs same query twice; sees different result set
| Example of Transaction Failure | 34/153 |
The above examples generally assumed that transactions committed.
Additional problems can arise when transactions abort.
We give examples using the following two transactions:
T1: read(X) T2: read(X)
X := X + N X := X + M
write(X) write(X)
read(Y)
Y := Y - N
write(Y)
and initial DB state X=100Y=50N=5M=8
| ... Example of Transaction Failure | 35/153 |
Consider the following schedule where one transaction fails:
T1: R(X) W(X) A T2: R(X) W(X)
Transaction T1X
The abort will rollback the changes to X
Consider three places where rollback might occur:
T1: R(X) W(X) A [1] [2] [3] T2: R(X) W(X)
| Transaction Failure - Case 1 | 36/153 |
This scenario is ok. T1
Database Transaction T1 Transaction T2
--------- ------------------ --------------
X Y X Y X
100 50 ? ? ?
read(X) 100
X:=X+N 105
105 write(X)
abort
100 rollback
read(X) 100
X:=X+M 108
108 write(X)
---------
108 50
| Transaction Failure - Case 2 | 37/153 |
In this scenario, some of T1
Database Transaction T1 Transaction T2
--------- ------------------ --------------
X Y X Y X
100 50 ? ? ?
read(X) 100
X:=X+N 105
105 write(X)
abort
read(X) 105
X:=X+M 113
100 rollback
113 write(X)
---------
113 50
| Transaction Failure - Case 3 | 38/153 |
In this scenario, T2
Database Transaction T1 Transaction T2
--------- ------------------ --------------
X Y X Y X
100 50 ? ? ?
read(X) 100
X:=X+N 105
105 write(X)
abort
read(X) 105
X:=X+M 113
113 write(X)
100 rollback
---------
100 50
| Transaction Isolation |
| Transaction Isolation | 40/153 |
If transactions always run "single-user":
| ... Transaction Isolation | 41/153 |
In practice, serial execution gives poor performance.
We need approaches that allow "safe" concurrency
| DBMS Transaction Management | 42/153 |
Abstract view of DBMS concurrency mechanisms:
The Scheduler
| Serializability | 43/153 |
Consider two schedules S1 and S2 produced by
| ... Serializability | 44/153 |
Two formulations of serializability:
i.e. there are VS schedules that are not CS, but not vice versa
| Isolation and Concurrency Control | 45/153 |
It is not useful to
| Other Properties of Schedules | 46/153 |
Above discussion explicitly assumes that all transactions commit.
What happens if some transaction aborts?
Under serial execution, there is no problem:
| Recoverability | 47/153 |
Consider the serializable schedule:
T1: R(X) W(Y) C T2: W(X) A
(where the final value of YX
Notes:
| ... Recoverability | 48/153 |
Recoverable schedules avoid these kinds of problems.
For a schedule to be recoverable, we require additional constraints
Note that recoverability does not prevent "dirty reads".
In order to make schedules recoverable in the presence of dirty reads and aborts, may need to abort multiple transactions.
| Cascading Aborts | 49/153 |
Recall the earlier non-recoverable schedule:
T1: R(X) W(Y) C T2: W(X) A
To make it recoverable requires:
T1: R(X) W(Y) ... A T2: W(X) A
Known as cascading aborts (or cascading rollback).
| ... Cascading Aborts | 50/153 |
Example: T3 aborts, causing T2 to abort, causing T1 to abort
T1: R(Y) W(Z) A T2: R(X) W(Y) A T3: W(X) A
Even though T1 has no direct connection with T3
(i.e. no shared data).
This kind of problem ...
| ... Cascading Aborts | 51/153 |
Cascading aborts can be avoided if
(alternative formulation: no tx can read data items written by an uncommitted tx)
Downside: reduces opportunity for concurrency.
GUW call these ACR (avoid cascading rollback) schedules.
All ACR schedules are also recoverable.
| Strictness | 52/153 |
Strict schedules also eliminate the chance of writing dirty data.
A schedule is strict if
| ... Strictness | 53/153 |
Example: non-strict schedule
T1: W(X) A T2: W(X) A
Problems with handling rollback after aborts:
| Schedule Properties | 54/153 |
Relationship between various classes of schedules:
Schedules ought to be serializable and strict.
But more serializable/strict ⇒ less concurrency.
DBMSs allow users to trade off "safety" against performance.
| Transaction Isolation Levels | 55/153 |
Previous examples showed:
| ... Transaction Isolation Levels | 56/153 |
In many real applications, there is either
| ... Transaction Isolation Levels | 57/153 |
SQL provides a mechanism for database programmers to specify how much isolation to apply to transactions
set transaction
read only -- so weaker isolation may be ok
read write -- suggests stronger isolation needed
isolation level
-- weakest isolation, maximum concurrency
read uncommitted
read committed
repeatable read
serializable
-- strongest isolation, minimum concurrency
| ... Transaction Isolation Levels | 58/153 |
Meaning of transaction isolation levels:
Isolation Dirty Nonrepeatable Phantom Level Read Read Read Read uncommitted Possible Possible Possible Read committed Not possible Possible Possible Repeatable read Not possible Not possible Possible Serializable Not possible Not possible Not possible
| ... Transaction Isolation Levels | 59/153 |
For transaction isolation, PostgreSQL
| ... Transaction Isolation Levels | 60/153 |
A PostgreSQL tx consists of a sequence of SQL statements:
BEGIN S1; S2; ... Sn; COMMIT;
Isolation levels affect view of DB provided to each Si:
| ... Transaction Isolation Levels | 61/153 |
Using PostgreSQL's serializable isolation level, a select
| ... Transaction Isolation Levels | 62/153 |
Example of repeatable read vs serializable
| Implementing Concurrency Control |
| Concurrency Control | 64/153 |
Approaches to concurrency control:
Synchronise transaction execution via locks on relevant part of DB.
Allow multiple consistent versions of the data to exist.
Each transaction has exclusive access to one version (the one when tx started).
Execute all transactions; check for validity problems just before commit.
Organise transaction execution via timestamps assigned to actions.
| Lock-based Concurrency Control | 65/153 |
Locks introduce additional mechanisms in DBMS:
The Scheduler
| ... Lock-based Concurrency Control | 66/153 |
Lock table entries contain:
Lock upgrade:
| ... Lock-based Concurrency Control | 67/153 |
Synchronise access to shared data items via following rules:
| ... Lock-based Concurrency Control | 68/153 |
Consider the following schedule, using locks:
T1: Lr(Y) R(Y) U(Y) Lw(X) W(X) U(X) T2: Lr(X) R(X) U(X) Lw(Y)....W(Y) U(Y)
(where Lr = read-lock, Lw = write-lock, U = unlock)
Locks correctly ensure controlled access to shared objects (XY
Despite this, the schedule is not serializable.
| Two-Phase Locking | 69/153 |
To guarantee serializability, we require an additional constraint on how locks are applied:
| ... Two-Phase Locking | 70/153 |
Consider the following two transactions:
T1: Lw(A) R(A) F1 W(A) Lw(B) U(A) R(B) G1 W(B) U(B) T2: Lw(A) R(A) F2 W(A) Lw(B) U(A) R(B) G2 W(B) U(B)
They follow 2PL protocol, inducing a schedule like:
T1(a): Lw(A) R(A) F1 W(A) Lw(B) U(A) T2(a): Lw(A) ...................... R(A) F2 W(A) T1(b): R(B) G1 W(B) U(B) T2(b): Lw(B) ............ U(A) R(B) G2 W(B) U(B)
| Problems with Locking | 71/153 |
Appropriate locking can guarantee correctness.
However, it also introduces potential undesirable effects:
No transactions can proceed; each waiting on lock held by another.
One transaction is permanently "frozen out" of access to data.
Locking introduces delays while waiting for locks to be released.
| Deadlock | 72/153 |
Deadlock occurs when two transactions are waiting for a lock on an item held by the other.
Example:
T1: Lw(A) R(A) Lw(B) ...... T2: Lw(B) R(B) Lw(A) .....
How to deal with deadlock?
| ... Deadlock | 73/153 |
Handling deadlock involves forcing a transaction to "back off".
| ... Deadlock | 74/153 |
Simple approach: timeout
| ... Deadlock | 75/153 |
A cycle in the waits-for graph indicates a deadlock.
Could prevent deadlock by
| ... Deadlock | 76/153 |
Alternative deadlock handling: timestamps.
Each tx is permanently allocated a unique timestamp (e.g. start-time).
When Tj tries to get lock held by Tk
| ... Deadlock | 77/153 |
Tj tries to get lock held by Tk ...
Wait-die scheme:
| ... Deadlock | 78/153 |
Properties of deadlock handling methods:
| Starvation | 79/153 |
Starvation occurs when one transaction
Multiple locks ⇒ need to decide which to release first.
Solutions:
| Locking Granularity | 80/153 |
Locking typically reduces concurrency ⇒ reduces throughput.
Granularity of locking can impact performance:
+ lock a small item ⇒ more of database accessible
+ lock a small item ⇒ quick update ⇒ quick lock release
- lock small items ⇒ more locks ⇒ more lock management
Granularity levels: field, row (tuple), table, whole database
| ... Locking Granularity | 81/153 |
Multiple-granularity locking protocol:
| Locking in B-trees | 82/153 |
How to lock a B-tree leaf node?
One possibility:
If for insert/delete, locks would be write locks.
This approach gives poor performance (lock on root is bottleneck).
| ... Locking in B-trees | 83/153 |
Approach for searching (select) ...
| Optimistic Concurrency Control | 84/153 |
Locking is a pessimistic approach to concurrency control:
In systems where read:write ratio is very high
| ... Optimistic Concurrency Control | 85/153 |
Transactions have three distinct phases:
| ... Optimistic Concurrency Control | 86/153 |
Data structures needed for validation:
| ... Optimistic Concurrency Control | 87/153 |
Two-transaction example:
| ... Optimistic Concurrency Control | 88/153 |
Case 1: reading overlaps validation/writing
| ... Optimistic Concurrency Control | 89/153 |
Case 2: reading/validation overlaps validation/writing
| ... Optimistic Concurrency Control | 90/153 |
Validation check for transaction T
What this prevents ...
| ... Optimistic Concurrency Control | 91/153 |
Problems with optimistic concurrency control:
| Multi-version Concurrency Control | 92/153 |
Multi-version concurrency control (MVCC) aims to
| ... Multi-version Concurrency Control | 93/153 |
Each transaction is
Each record in the database is
| ... Multi-version Concurrency Control | 94/153 |
When a reader Ti is accessing the database
| ... Multi-version Concurrency Control | 95/153 |
Advantage of MVCC
| ... Multi-version Concurrency Control | 96/153 |
Removing old versions:
| Concurrency Control in SQL | 97/153 |
Transactions in SQL are specified by
BEGINCOMMITROLLBACK
| ... Concurrency Control in SQL | 98/153 |
More fine-grained control of "undo" via savepoints:
SAVEPOINTROLLBACK TO SAVEPOINTbegin; insert into numbersTable values (1); savepoint my_savepoint; insert into numbersTable values (2); rollback to savepoint my_savepoint; insert into numbersTable values (3); commit;
will insert 1 and 3 into the table, but not 2.
| ... Concurrency Control in SQL | 99/153 |
SQL standard defines four levels of transaction isolation.
PostgreSQL implements: repeatable-read = serializable, read-uncommitted = read-committed
| ... Concurrency Control in SQL | 100/153 |
Using the serializable isolation level, a select
| ... Concurrency Control in SQL | 101/153 |
Explicit control of concurrent access is available, e.g.
Table-level locking: LOCK TABLE
ALTER TABLE
Row-level locking: SELECT FOR UPDATEDELETE
| Concurrency Control in PostgreSQL | 102/153 |
PostgreSQL uses two styles of concurrency control:
selectcreate tableLOCK
| ... Concurrency Control in PostgreSQL | 103/153 |
Implementing MVCC in PostgreSQL requires:
xminxmaxxnew
| ... Concurrency Control in PostgreSQL | 104/153 |
Rules for a tuple to be visible to Tj:
xminxmax
| ... Concurrency Control in PostgreSQL | 105/153 |
Tx's always see a consistent version of the database.
But may not see the "current" version of the database.
E.g. T1 does select, then concurrent T2 deletes some of T1's selected tuples
This is OK unless tx's communicate outside the database system.
E.g. T1 counts tuples; T2 deletes then counts; then counts are compared
Use locks if application needs every tx to see same current version
LOCK TABLESELECT FOR UPDATE
| Implementing Atomicity/Durability |
| Atomicity/Durability | 107/153 |
Reminder:
Transactions are atomic
| Durability | 108/153 |
What kinds of "system failures" do we need to deal with?
postgres
| ... Durability | 109/153 |
Consider following scenario:
Desired behaviour after system restart:
| ... Durability | 110/153 |
Durabilty begins with a stable disk storage subsystem.
Operations:
putBlock(Buffer *b)getBlock(Buffer *b)putBlockgetBlock
| ... Durability | 111/153 |
Implementation of transaction operations R(V) and W(V)
Value R(Object V) {
B = getBuf(blockContaining(V))
return value of V from B
}
void W(Object V, value k) {
B = getBuf(blockContaining(V))
set value of V in B
}
Note:
putBuf()
| ... Durability | 112/153 |
getBuf()putBuf()
Buffer getBuf(BlockAddr A) {
if (!inBufferPool(A)) {
B = availableBuffer(A);
getBlock(B);
}
return bufferContaining(A);
}
void putBuf(BlockAddr A) {
B = bufferContaining(A);
putBlock(B);
}
| Stable Store | 113/153 |
One simple strategy using redundancy: stable store.
Protects against all failures in a single disk sector.
Each logical disk page X is stored twice.
(Obvious disadvantage: disk capacity is halved)
X is stored in sector S on disk L and sector T on disk R
Assume that a sound parity-check is available
(i.e. can always recognise whether data has transferred mem↔disk correctly)
| ... Stable Store | 114/153 |
Low level sector i/o functions:
int writeSector(char *b, Disk d, Sector s) {
int nTries = 0;
do {
nTries++; write(d,s,b);
} while (bad(parity) && nTries < MaxTries)
return nTries;
}
int readSector(char *b, Disk d, Sector s) {
int nTries = 0;
do {
nTries++; read(d,s,b);
} while (bad(parity) && nTries < MaxTries)
return nTries;
}
| ... Stable Store | 115/153 |
Writing data to disk with stable store:
int writeBlock(Buffer *b, Disk d, Sector s) {
int sec;
for (;;) {
sec = (s > 0) ? s : getUnusedSector(d);
n = writeSector(b->data, d, sec);
if (n == maxTries)
mark s as unusable
else
return sec;
}
}
writeBlock
| ... Stable Store | 116/153 |
Reading data from disk with stable store:
int readBlock(Buffer *b) {
int n = readSector(b->data, b->diskL, b->sectorL);
if (n == maxTries) {
n = readSector(b->data, b->diskR, b->sectorR);
if (n == maxTries) return -1; // read failure
}
return 0; // successful read
}
| ... Stable Store | 117/153 |
Consider scenario where power fails during write operation:
writeBlock(X,L,S); writeBlock(X,R,T);
| ... Stable Store | 118/153 |
How stable storage handles failure during writing:
| RAID | 119/153 |
RAID gives techniques to achieve
| Stable Storage Subsystem | 120/153 |
We can prevent/minimise loss/corruption of data due to:
| Dealing with Transactions | 121/153 |
The remaining "failure modes" that we need to consider:
ABORT
| Architecture for Atomicity/Durability | 122/153 |
How does a DBMS provide for atomicity/durability?
| Execution of Transactions | 123/153 |
Transactions deal with three address spaces:
| ... Execution of Transactions | 124/153 |
Operations available for data transfer:
INPUT(X)XREAD(X,v)XvWRITE(X,v)vXOUTPUT(X)XREAD/WRITE
INPUT/OUTPUT
| ... Execution of Transactions | 125/153 |
Example of transaction execution:
-- implements A = A*2; B = B+1; BEGIN READ(A,v); v = v*2; WRITE(A,v); READ(B,v); v = v+1; WRITE(B,v); COMMIT
READINPUT
COMMIT
| ... Execution of Transactions | 126/153 |
States as the transaction executes:
t Action v Buf(A) Buf(B) Disk(A) Disk(B) ----------------------------------------------------- (0) BEGIN . . . 8 5 (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 (7) OUTPUT(A) 6 16 6 16 5 (8) OUTPUT(B) 6 16 6 16 6
After tx completes, we must have either
Disk(A)=8,Disk(B)=5 or Disk(A)=16,Disk(B)=6
If system crashes before (8), may need to undo disk changes.
If system crashes after (8), may need to redo disk changes.
| Transactions and Buffer Pool | 127/153 |
Two issues arise w.r.t. buffers:
OUTPUTWRITE
| ... Transactions and Buffer Pool | 128/153 |
Handling stealing:
| ... Transactions and Buffer Pool | 129/153 |
Handling no forcing:
WRITE()
| Logging | 130/153 |
Three "styles" of logging
| Undo Logging | 131/153 |
Simple form of logging which ensures atomicity.
Log file consists of a sequence of small records:
<START T>T<COMMIT T>T<ABORT T>T<T,X,v>TXvWRITEOUTPUT
| ... Undo Logging | 132/153 |
Data must be written to disk in the following order:
<T,X,v>XX
| ... Undo Logging | 133/153 |
For the example transaction, we would get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,8> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,5> (7) FlushLog (8) StartCommit (9) OUTPUT(A) 6 16 6 16 5 (10) OUTPUT(B) 6 16 6 16 6 (11) EndCommit <COMMIT T> (12) FlushLog
Note that T is not regarded as committed until (11).
| ... Undo Logging | 134/153 |
Simplified view of recovery using UNDO logging:
committedTrans = abortedTrans = startedTrans = {}
for each log record from most recent to oldest {
switch (log record) {
<COMMIT T> : add T to committedTrans
<ABORT T> : add T to abortedTrans
<START T> : add T to startedTrans
<T,X,v> : if (T in committedTrans)
// don't undo committed changes
else
{ WRITE(X,v); OUTPUT(X) }
} }
for each T in startedTrans {
if (T in committedTrans) ignore
else if (T in abortedTrans) ignore
else write <ABORT T> to log
}
flush log
| ... Undo Logging | 135/153 |
Recall example transaction and consider effects of system crash at the following points:
Before (9) ... disk "restored" (unchanged); <ABORT T>
(9)-(11) ... disk restored to original state; <ABORT T>
After (12) ... A and B left unchanged; T
"Disk restored" means
WRITE(B,5); OUTPUT(B); WRITE(A,8); OUTPUT(A);
| Checkpointing | 136/153 |
Previous view of recovery implied reading entire log file.
Since log file grows without bound, this is infeasible.
Eventually we can delete "old" section of log.
| ... Checkpointing | 137/153 |
Problem: many concurrent/overlapping transactions.
How to know that all have finished?
Simplistic approach
<COMMIT T><ABORT T><CHKPT>
| ... Checkpointing | 138/153 |
Obvious problem with quiescent checkpointing
<CHKPT (T1,..,Tk)>T1,..,Tk<ENDCHKPT>
| ... Checkpointing | 139/153 |
Recovery: scan backwards through log file processing as before.
Determining where to stop depends on ...
If we encounter <ENDCHKPT>
<CHKPT...><CHKPT...><CHKPT (T1,..,Tk)>T1,..,Tk
| Redo Logging | 140/153 |
Problem with UNDO logging:
| ... Redo Logging | 141/153 |
Requirement for redo logging: write-ahead rule.
Data must be written to disk as follows:
<T,X,v'>v'X
| ... Redo Logging | 142/153 |
For the example transaction, we would get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,16> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,6> (7) COMMIT <COMMIT T> (8) FlushLog (9) OUTPUT(A) 6 16 6 16 5 (10) OUTPUT(B) 6 16 6 16 6
Note that T is regarded as committed as soon as (8) completes.
| ... Redo Logging | 143/153 |
Simplified view of recovery using REDO logging:
set committedTrans// e.g. from tx table for each log record from oldest to most recent { switch (log record) { <COMMIT T> : ignore// know these already <ABORT T> : add T to abortedTrans <START T> : add T to startedTrans <T,X,v'> : if (T in committedTrans) { WRITE(X,v'); OUTPUT(X) } else// nothing to do, no change on disk } } for each T in startedTrans { if (T in committedTrans) ignore else if (T in abortedTrans) ignore else write <ABORT T> to log } flush log
| ... Redo Logging | 144/153 |
Data required for REDO logging checkpoints:
| ... Redo Logging | 145/153 |
Checkpoints in REDO logging use (as before):
<CHKPT (T1,..,Tk)><ENDCHKPT><CHKPT (T1,..,Tk)><CHKPT...><ENDCHKPT>
| ... Redo Logging | 146/153 |
Recovery with checkpointed REDO log.
As for UNDO logging, two cases ...
Last checkpoint record before crash is <ENDCHKPT>
<CHKPT...>T1,..,Tk<CHKPT...><CHKPT (T1,..,Tk)><ENDCHKPT>
| Undo/Redo Logging | 147/153 |
UNDO logging and REDO logging are incompatible in
<COMMIT T><T,X,v,v'>XUndo/redo loging is common in practice; Aries algorithm.
| ... Undo/Redo Logging | 148/153 |
Requirement for undo/redo logging: write-ahead.
Data must be written to disk as follows:
<COMMIT T>
| ... Undo/Redo Logging | 149/153 |
For the example transaction, we might get:
t Action v B(A) B(B) D(A) D(B) Log -------------------------------------------------------- (0) BEGIN . . . 8 5 <START T> (1) READ(A,v) 8 8 . 8 5 (2) v = v*2 16 8 . 8 5 (3) WRITE(A,v) 16 16 . 8 5 <T,A,8,16> (4) READ(B,v) 5 16 5 8 5 (5) v = v+1 6 16 5 8 5 (6) WRITE(B,v) 6 16 6 8 5 <T,B,5,6> (7) FlushLog (8) StartCommit (9) OUTPUT(A) 6 16 6 16 5 (10) <COMMIT T> (11) OUTPUT(B) 6 16 6 16 6
Note that T is regarded as committed as soon as (10) completes.
| ... Undo/Redo Logging | 150/153 |
Recovery using undo/redo logging:
Before (10) ... treat as incomplete; undo all changes
After (10) ... treat as complete; redo all changes
| ... Undo/Redo Logging | 151/153 |
Steps in UNDO/REDO log checkpointing
<CHKPT (T1,..,Tk)><ENDCHKPT>A consequence of the above:
COMMIT
| ... Undo/Redo Logging | 152/153 |
The above description simplifies details of undo/redo logging.
Aries is a complete algorithm for undo/redo logging.
Differences to what we have described:
<END><CHKPT><ENDCHKPT..>
| Recovery in PostgreSQL | 153/153 |
PostgreSQL uses write-ahead undo/redo style logging.
However, it also uses multi-version concurrency control
Core transaction code is in src/backend/access/transam
Produced: 7 Aug 2019