H: Transactions, Concurrency, Recovery

COMP9315 24T1 ♢ Lectures Part H ♢ [0/92]
❖ Transaction Processing

A transaction (tx) is ...


A transaction effects a state change on the DB

[Diagram:Pics/txproc/tx-exec1.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [1/92]
❖ Transaction Processing (cont)

Transaction states:

[Diagram:Pics/txproc/tx-states1.png]

COMMIT all changes preserved,   ABORT database unchanged

COMP9315 24T1 ♢ Lectures Part H ♢ [2/92]
❖ Transaction Processing (cont)

Concurrent transactions are

To ensure problem-free concurrent transactions:
COMP9315 24T1 ♢ Lectures Part H ♢ [3/92]
❖ Transaction Processing (cont)

Transaction processing:


Consistency is the property: Ensuring this must be left to application programmers.


Our discussion focusses on: Atomicity, Durability, Isolation

COMP9315 24T1 ♢ Lectures Part H ♢ [4/92]
❖ Transaction Processing (cont)

Atomicity is handled by the commit and abort mechanisms

Durability is handled by implementing stable storage, via

Isolation is handled by concurrency control mechanisms
COMP9315 24T1 ♢ Lectures Part H ♢ [5/92]
❖ Transaction Processing (cont)

Where transaction processing fits in the DBMS:

[Diagram:Pics/txproc/dbmsarch.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [6/92]
❖ Transaction Terminology

To describe transaction effects, we consider:

Relationship between the above operations and SQL:
COMP9315 24T1 ♢ Lectures Part H ♢ [7/92]
❖ Transaction Terminology (cont)

More on transactions and SQL


In PostgreSQL, tx's cannot be defined inside functions (e.g. PLpgSQL)

COMP9315 24T1 ♢ Lectures Part H ♢ [8/92]
❖ Transaction Terminology (cont)

The READ, WRITE, ABORT, COMMIT operations:

The operations are typically denoted as:

RT(X) read item X in transaction T
WT(X) write item X in transaction T
AT abort transaction T
CT commit transaction T

COMP9315 24T1 ♢ Lectures Part H ♢ [9/92]
❖ Schedules

A schedule gives the sequence of operations from ≥ 1 tx

Serial schedule for a set of tx's T1 .. Tn

E.g.   RT1(A)   WT1(A)   RT2(B)   RT2(A)   WT3(C)   WT3(B)  

Concurrent schedule for a set of tx's T1 .. Tn

E.g.   RT1(A)   RT2(B)   WT1(A)   WT3(C)   WT3(B)   RT2(A)  
COMP9315 24T1 ♢ Lectures Part H ♢ [10/92]
❖ Schedules (cont)

Serial schedules guarantee database consistency

Concurrent schedules interleave tx operations arbitrarily
COMP9315 24T1 ♢ Lectures Part H ♢ [11/92]
❖ Transaction Anomalies

What problems can occur with (uncontrolled) concurrent tx's?

The set of phenomena can be characterised broadly under:

COMP9315 24T1 ♢ Lectures Part H ♢ [12/92]
❖ Schedule Properties

If a concurrent schedule on a set of tx's TT ...

Primary goal of isolation mechanisms (see later) is Serializability is one property of a schedule, focusing on isolation

Other properties of schedules focus on recovering from failures

COMP9315 24T1 ♢ Lectures Part H ♢ [13/92]
❖ Transaction Failure

So far, have implicitly assumed that all transactions commit.

Additional problems can arise when transactions abort.

Consider the following schedule where transaction T1 fails:

T1: R(X) W(X) A
T2:             R(X) W(X) C

Abort will rollback the changes to X, but ...

Consider three places where the rollback might occur:

T1: R(X) W(X) A [1]     [2]        [3]
T2:                 R(X)    W(X) C

COMP9315 24T1 ♢ Lectures Part H ♢ [14/92]
❖ Transaction Failure (cont)

Abort / rollback scenarios:

T1: R(X) W(X) A [1]     [2]        [3]
T2:                 R(X)    W(X) C

Case [1] is ok

Case [2] is problematic Case [3] is also problematic
COMP9315 24T1 ♢ Lectures Part H ♢ [15/92]
❖ Recoverability

Consider the serializable schedule:

T1:        R(X)  W(Y)  C
T2:  W(X)                 A

(where the final value of Y is dependent on the X value)

Notes:

Produces an invalid database state, even though serializable.
COMP9315 24T1 ♢ Lectures Part H ♢ [16/92]
❖ Recoverability (cont)

Recoverable schedules avoid these kinds of problems.

For a schedule to be recoverable, we require additional constraints

and this property must hold for all transactions Tj

Note that recoverability does not prevent "dirty reads".

In order to make schedules recoverable in the presence of dirty reads and aborts, may need to abort multiple transactions.

COMP9315 24T1 ♢ Lectures Part H ♢ [17/92]
❖ Cascading Aborts

Recall the earlier non-recoverable schedule:

T1:        R(X)  W(Y)  C
T2:  W(X)                 A

To make it recoverable requires:

T1:        R(X)  W(Y) ...   C? A!
T2:  W(X)                 A

Known as cascading aborts (or cascading rollback).

COMP9315 24T1 ♢ Lectures Part H ♢ [18/92]
❖ Cascading Aborts (cont)

Example: T3 aborts, causing T2 to abort, causing T1 to abort

T1:                    R(Y)  W(Z)        A
T2:        R(X)  W(Y)                 A
T3:  W(X)                          A

Even though T1 has no direct connection with T3
(i.e. no shared data).

This kind of problem ...

COMP9315 24T1 ♢ Lectures Part H ♢ [19/92]
❖ Cascading Aborts (cont)

Cascading aborts can be avoided if

Effectively: eliminate the possibility of reading dirty data.

Downside: reduces opportunity for concurrency.

These are called ACR (avoid cascading rollback) schedules.

All ACR schedules are also recoverable.

COMP9315 24T1 ♢ Lectures Part H ♢ [20/92]
❖ Strictness

Strict schedules also eliminate the chance of writing  dirty data.

A schedule is strict if

Strict schedules simplify the task of rolling back after aborts.
COMP9315 24T1 ♢ Lectures Part H ♢ [21/92]
❖ Strictness (cont)

Example: non-strict schedule

T1:  W(X)        A
T2:        W(X)     A

Problems with handling rollback after aborts:

COMP9315 24T1 ♢ Lectures Part H ♢ [22/92]
❖ Classes of Schedules

Relationship between various classes of schedules:

[Diagram:Pics/txproc/schedules.png]

Schedules ought to be serializable and strict.

But more serializable/strict less concurrency.

DBMSs allow users to trade off "safety" against performance.

COMP9315 24T1 ♢ Lectures Part H ♢ [23/92]
❖ Transaction Isolation

Simplest form of isolation: serial execution   (T1 ; T2 ; T3 ; ...)

Problem: serial execution yields poor throughput.

Concurrency control schemes (CCSs) aim for "safe" concurrency

Abstract view of DBMS concurrency mechanisms:

[Diagram:Pics/txproc/txproc1.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [24/92]
❖ Serializability

Consider two schedules S1 and S2 produced by

S1  and S2  are equivalent if StateAfter(S1)  =  StateAfter(S2)
S  is a serializable schedule (for a set of concurrent tx's T1 ..Tn) if Under these circumstances, consistency is guaranteed
(assuming no aborted transactions and no system failures)
COMP9315 24T1 ♢ Lectures Part H ♢ [25/92]
❖ Serializability (cont)

Two formulations of serializability:

View serializability is strictly weaker than conflict serializability.
COMP9315 24T1 ♢ Lectures Part H ♢ [26/92]
❖ Transaction Isolation Levels

SQL programmers' concurrency control mechanism ...

set transaction
    read only  -- so weaker isolation may be ok
    read write -- suggests stronger isolation needed
isolation level
    -- weakest isolation, maximum concurrency
    read uncommitted
    read committed
    repeatable read
    serializable
    -- strongest isolation, minimum concurrency

Applies to current tx only; affects how scheduler treats this tx.

COMP9315 24T1 ♢ Lectures Part H ♢ [27/92]
❖ Transaction Isolation Levels (cont)

Implication of transaction isolation levels:

Isolation
Level
Dirty
Read
Nonrepeatable
Read
Phantom
Read
Read
Uncommitted
Possible Possible Possible
Read
Committed
Not Possible Possible Possible
Repeatable
Read
Not Possible Not Possible Possible
Serializable Not Possible Not Possible Not Possible
COMP9315 24T1 ♢ Lectures Part H ♢ [28/92]
❖ Transaction Isolation Levels (cont)

For transaction isolation, PostgreSQL


Note: cannot implement read uncommitted because of MVCC


For more details, see PostgreSQL Documentation section 13.2

COMP9315 24T1 ♢ Lectures Part H ♢ [29/92]
❖ Transaction Isolation Levels (cont)

A PostgreSQL tx consists of a sequence of SQL statements:

BEGIN S1; S2; ... Sn; COMMIT;

Isolation levels affect view of DB provided to each Si:

Transactions fail if the system detects violation of isolation level.
COMP9315 24T1 ♢ Lectures Part H ♢ [30/92]
❖ Concurrency Control

Approaches to concurrency control:

COMP9315 24T1 ♢ Lectures Part H ♢ [31/92]
❖ Lock-based Concurrency Control

Locks introduce additional mechanisms in DBMS:

[Diagram:Pics/txproc/txproc2.png]

The Lock Manager

COMP9315 24T1 ♢ Lectures Part H ♢ [32/92]
❖ Lock-based Concurrency Control (cont)

Lock table entries contain:

Lock and unlock operations must be atomic.

Lock upgrade:

COMP9315 24T1 ♢ Lectures Part H ♢ [33/92]
❖ Lock-based Concurrency Control (cont)

Synchronise access to shared data items via following rules:

These rules alone do not guarantee serializability.
COMP9315 24T1 ♢ Lectures Part H ♢ [34/92]
❖ Lock-based Concurrency Control (cont)

Consider the following schedule, using locks:

T1(a): Lr(Y)     R(Y)           continued
T2(a):      Lr(X)    R(X) U(X)  continued

T1(b):      U(Y)         Lw(X) W(X) U(X)
T2(b): Lw(Y)....W(Y) U(Y)

(where Lr = read-lock, Lw = write-lock, U = unlock)

Locks correctly ensure controlled access to X and Y.

Despite this, the schedule is not serializable.  (Ex: prove this)

COMP9315 24T1 ♢ Lectures Part H ♢ [35/92]
❖ Two-Phase Locking

To guarantee serializability, we require an additional constraint:

Each transaction is then structured as:
Clearly, this reduces potential concurrency ...
COMP9315 24T1 ♢ Lectures Part H ♢ [36/92]
❖ Problems with Locking

Appropriate locking can guarantee correctness.

However, it also introduces potential undesirable effects:

COMP9315 24T1 ♢ Lectures Part H ♢ [37/92]
❖ Deadlock

Deadlock occurs when two transactions are waiting for a lock on an item held by the other.

Example:

T1: Lw(A) R(A)            Lw(B) ......
T2:            Lw(B) R(B)       Lw(A) .....

How to deal with deadlock?

COMP9315 24T1 ♢ Lectures Part H ♢ [38/92]
❖ Deadlock (cont)

Handling deadlock involves forcing a transaction to "back off"

COMP9315 24T1 ♢ Lectures Part H ♢ [39/92]
❖ Deadlock (cont)

Methods for managing deadlock

COMP9315 24T1 ♢ Lectures Part H ♢ [40/92]
❖ Deadlock (cont)

Properties of deadlock handling methods:

COMP9315 24T1 ♢ Lectures Part H ♢ [41/92]
❖ Optimistic Concurrency Control

Locking is a pessimistic approach to concurrency control:

Costs: lock management, deadlock handling, contention.

In scenarios where there are far more reads than writes ...

Optimistic concurrency control (OCC) is a strategy to realise this.
COMP9315 24T1 ♢ Lectures Part H ♢ [42/92]
❖ Optimistic Concurrency Control (cont)

Under OCC, transactions have three distinct phases:

Timestamps are recorded at points S, V, F :

[Diagram:Pics/txproc/occ-phases.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [43/92]
❖ Optimistic Concurrency Control (cont)

Data structures needed for validation:

Use the V  timestamps as ordering for transactions
COMP9315 24T1 ♢ Lectures Part H ♢ [44/92]
❖ Optimistic Concurrency Control (cont)

Two-transaction example:

Case 0: serial execution ... no problem

[Diagram:Pics/txproc/occ0.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [45/92]
❖ Optimistic Concurrency Control (cont)

Case 1: reading overlaps validation/writing

[Diagram:Pics/txproc/occ1.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [46/92]
❖ Optimistic Concurrency Control (cont)

Case 2: reading/validation overlaps validation/writing

[Diagram:Pics/txproc/occ2.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [47/92]
❖ Optimistic Concurrency Control (cont)

Validation check for transaction T

If this check fails for any Ti, then T  is rolled back.
COMP9315 24T1 ♢ Lectures Part H ♢ [48/92]
❖ Optimistic Concurrency Control (cont)

OCC prevents:   T  reading dirty data,   T  overwriting Ti's  changes

Problems with OCC:

** "Roll back" is relatively cheap
COMP9315 24T1 ♢ Lectures Part H ♢ [49/92]
❖ Multi-version Concurrency Control

Multi-version concurrency control (MVCC) aims to

Achieves this by Main difference between MVCC and standard locking:
COMP9315 24T1 ♢ Lectures Part H ♢ [50/92]
❖ Multi-version Concurrency Control (cont)

WTS = timestamp of tx that wrote this data item

Chained tuple versions:   tupoldest → tupolder → tupnewest

When a reader Ti  is accessing the database

COMP9315 24T1 ♢ Lectures Part H ♢ [51/92]
❖ Multi-version Concurrency Control (cont)

When a writer Ti  attempts to change a data item


Some MVCC versions also maintain RTS (TS of last reader)
COMP9315 24T1 ♢ Lectures Part H ♢ [52/92]
❖ Multi-version Concurrency Control (cont)

Advantage of MVCC

Disadvantages of MVCC Despite apparent disadvantages, MVCC is very effective.
COMP9315 24T1 ♢ Lectures Part H ♢ [53/92]
❖ Multi-version Concurrency Control (cont)

Removing old versions:

When to make this check? PostgreSQL uses the latter (vacuum).
COMP9315 24T1 ♢ Lectures Part H ♢ [54/92]
❖ Concurrency Control in PostgreSQL

PostgreSQL uses two styles of concurrency control:

From the SQL (PLpgSQL) level:
COMP9315 24T1 ♢ Lectures Part H ♢ [55/92]
❖ Concurrency Control in PostgreSQL (cont)

PostgreSQL provides read committed and serializable isolation levels.

Using the serializable isolation level, a select:

Using the serializable isolation level, an update fails: The transaction containing the update must then rollback and re-start.
COMP9315 24T1 ♢ Lectures Part H ♢ [56/92]
❖ Concurrency Control in PostgreSQL (cont)

Implementing MVCC in PostgreSQL requires:

COMP9315 24T1 ♢ Lectures Part H ♢ [57/92]
❖ Concurrency Control in PostgreSQL (cont)

Rules for a tuple to be visible to Ti :

For details, see: utils/time/tqual.c
COMP9315 24T1 ♢ Lectures Part H ♢ [58/92]
❖ Concurrency Control in PostgreSQL (cont)

Tx's always see a consistent version of the database.

But may not see the "current" version of the database.

E.g. T1 does select, then concurrent T2 deletes some of T1's selected tuples

This is OK unless tx's communicate outside the database system.

E.g. T1 counts tuples; T2 deletes then counts; then counts are compared

Use locks if application needs every tx to see same current version

COMP9315 24T1 ♢ Lectures Part H ♢ [59/92]
❖ Atomicity/Durability

Reminder:

Transactions are atomic

Transaction effects are durable
Implementation of atomicity/durability is intertwined.
COMP9315 24T1 ♢ Lectures Part H ♢ [60/92]
❖ Durability

What kinds of "system failures" do we need to deal with?

The last requires off-site backup; all others should be locally recoverable.

COMP9315 24T1 ♢ Lectures Part H ♢ [61/92]
❖ Durability (cont)

Consider following scenario:

[Diagram:Pics/txproc/crash.png]

Desired behaviour after system restart:

COMP9315 24T1 ♢ Lectures Part H ♢ [62/92]
❖ Durability (cont)

Durabilty begins with a stable disk storage subsystem


We can prevent/minimise loss/corruption of data due to:
COMP9315 24T1 ♢ Lectures Part H ♢ [63/92]
❖ Dealing with Transactions

The remaining "failure modes" that we need to consider:


Standard technique for managing these:
COMP9315 24T1 ♢ Lectures Part H ♢ [64/92]
❖ Architecture for Atomicity/Durability

How does a DBMS provide for atomicity/durability?


[Diagram:Pics/txproc/arch.png]

COMP9315 24T1 ♢ Lectures Part H ♢ [65/92]
❖ Execution of Transactions

Transactions deal with three address spaces:

Each of these may hold a different "version" of a DB object.


PostgreSQL processes make heavy use of shared buffer pool

⇒ transactions do not deal with much local data.

COMP9315 24T1 ♢ Lectures Part H ♢ [66/92]
❖ Execution of Transactions (cont)

Operations available for data transfer:

READ/WRITE are issued by transaction.

INPUT/OUTPUT are issued by buffer manager (and log manager).

INPUT/OUTPUT correspond to getPage()/putPage() mentioned above

COMP9315 24T1 ♢ Lectures Part H ♢ [67/92]
❖ Execution of Transactions (cont)

Example of transaction execution:

-- implements A = A*2; B = B+1;
BEGIN
READ(A,v); v = v*2; WRITE(A,v);
READ(B,v); v = v+1; WRITE(B,v);
COMMIT

READ accesses the buffer manager and may cause INPUT.

COMMIT needs to ensure that buffer contents go to disk.

COMP9315 24T1 ♢ Lectures Part H ♢ [68/92]
❖ Execution of Transactions (cont)

States as the transaction executes:


t   Action        v  Buf(A)  Buf(B)  Disk(A)  Disk(B)
-----------------------------------------------------
(0) BEGIN         .      .       .        8        5
(1) READ(A,v)     8      8       .        8        5
(2) v = v*2      16      8       .        8        5
(3) WRITE(A,v)   16     16       .        8        5
(4) READ(B,v)     5     16       5        8        5
(5) v = v+1       6     16       5        8        5
(6) WRITE(B,v)    6     16       6        8        5
(7) OUTPUT(A)     6     16       6       16        5
(8) OUTPUT(B)     6     16       6       16        6

After tx completes, we must have either
Disk(A)=8, Disk(B)=5   or   Disk(A)=16, Disk(B)=6

If system crashes before (8), may need to undo disk changes.
If system crashes after (8), may need to redo disk changes.

COMP9315 24T1 ♢ Lectures Part H ♢ [69/92]
❖ Transactions and Buffer Pool

Two issues arise w.r.t. buffers:

Ideally, we want stealing and not forcing.
COMP9315 24T1 ♢ Lectures Part H ♢ [70/92]
❖ Transactions and Buffer Pool (cont)

Handling stealing:

COMP9315 24T1 ♢ Lectures Part H ♢ [71/92]
❖ Transactions and Buffer Pool (cont)

Handling no forcing:


Above scenario may be a problem, even if we are forcing
COMP9315 24T1 ♢ Lectures Part H ♢ [72/92]
❖ Transactions (reminder)

A transaction is

Transactions deal with three levels of storage Transactions ...
COMP9315 24T1 ♢ Lectures Part H ♢ [73/92]
❖ Logging

Three "styles" of logging

All approaches require: Known as write-ahead logging    (PostgreSQL uses WAL)
COMP9315 24T1 ♢ Lectures Part H ♢ [74/92]
❖ Undo Logging

Simple form of logging which ensures atomicity.

Log file consists of a sequence of small records:

Notes:
COMP9315 24T1 ♢ Lectures Part H ♢ [75/92]
❖ Undo Logging (cont)

Data must be written to disk in the following order:

  1. <START> transaction log record
  2. <UPDATE> log records indicating changes
  3. the changed data elements themselves
  4. <COMMIT> log record
Note: sufficient to have <T,X,v> output before X, for each X
COMP9315 24T1 ♢ Lectures Part H ♢ [76/92]
❖ Undo Logging (cont)

For the example transaction, we would get:

t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6
(11) EndCommit                                <COMMIT T>
(12) FlushLog

Note that T is not regarded as committed until (12) completes.

COMP9315 24T1 ♢ Lectures Part H ♢ [77/92]
❖ UNDO Logging

❖ UNDO Logging (cont)

Simplified view of recovery using UNDO logging:

Assumes we scan entire log; use checkpoints to limit scan.
COMP9315 24T1 ♢ Lectures Part H ♢ [78/92]
❖ UNDO Logging (cont)

Algorithmic view of recovery using UNDO logging:

committedTrans = abortedTrans = startedTrans = {}
for each log record from most recent to oldest {
    switch (log record) {
    <COMMIT T> : add T to committedTrans
    <ABORT T>  : add T to abortedTrans
    <START T>  : add T to startedTrans
    <T,X,v>    : if (T in committedTrans)
                     // don't undo committed changes
                 else  // roll-back changes
                     { WRITE(X,v); OUTPUT(X) }
}   }
for each T in startedTrans {
    if (T in committedTrans) ignore
    else if (T in abortedTrans) ignore
    else write <ABORT T> to log
}
flush log

COMP9315 24T1 ♢ Lectures Part H ♢ [79/92]
❖ Checkpointing

Simple view of recovery implies reading entire log file.

Since log file grows without bound, this is infeasible.

Eventually we can delete "old" section of log.

This point is called a checkpoint. As described we need to wait for all active tx to complete
COMP9315 24T1 ♢ Lectures Part H ♢ [80/92]
❖ Checkpointing (cont)

Problem: many concurrent/overlapping transactions.

How to know that all have finished?

  1. periodically, write log record <CHKPT (T1,..,Tk)>
    (contains references to all active transactions active tx table)
  2. continue normal processing (e.g. new tx's can start)
  3. when all of T1,..,Tk have completed,
    write log record <END CHKPT> and flush log
Note: tx manager maintains chkpt and active tx information

These kinds of checkpoints are often written as <START CHKPT ...>

COMP9315 24T1 ♢ Lectures Part H ♢ [81/92]
❖ Checkpointing (cont)

Recovery: scan backwards through log file processing as before.

Determining where to stop depends on ...

If we encounter <END CHKPT> first:

If we encounter <CHKPT (T1,..,Tk)> first:
COMP9315 24T1 ♢ Lectures Part H ♢ [82/92]
❖ Redo Logging

Problem with UNDO logging:

Alternative approach is redo logging:
COMP9315 24T1 ♢ Lectures Part H ♢ [83/92]
❖ Redo Logging (cont)

Requirement for redo logging: write-ahead rule.

Data must be written to disk as follows:

  1. write start transaction <START T> log record
  2. write <UPDATE...> log records indicating changes
  3. then write <COMMIT> log record (and flush log)
  4. then OUTPUT changed data elements themselves
Note that update log records now contain <T,X,v'>,
where v' is the new value for X.
COMP9315 24T1 ♢ Lectures Part H ♢ [84/92]
❖ Redo Logging (cont)

For the example transaction, we would get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,6>
(7)  COMMIT                                   <COMMIT T>
(8)  FlushLog
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (8) completes.

COMP9315 24T1 ♢ Lectures Part H ♢ [85/92]
❖ Redo Logging (cont)

Simplified view of recovery using REDO logging:

Use checkpoints (like UNDO log checkpoints) to limit scan.
COMP9315 24T1 ♢ Lectures Part H ♢ [86/92]
❖ Undo/Redo Logging

UNDO logging and REDO logging are incompatible in

Undo/Redo logging combines aspects of both As for previous cases, requires write-ahead of log records.

Undo/redo loging is common in practice; Aries algorithm.

COMP9315 24T1 ♢ Lectures Part H ♢ [87/92]
❖ Undo/Redo Logging (cont)

For the example transaction, we might get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5,6>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10)                                          <COMMIT T>
(11) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (10) completes.

COMP9315 24T1 ♢ Lectures Part H ♢ [88/92]
❖ Undo/Redo Logging (cont)

Simplified view of recovery using UNDO/REDO logging:


Note: undo/redo logging requires dirty buffers to be flushed at <CHKPT...>
COMP9315 24T1 ♢ Lectures Part H ♢ [89/92]
❖ Undo/Redo Logging (cont)

The above description simplifies details of undo/redo logging.

Aries is a complete algorithm for undo/redo logging.

Differences to what we have described:

COMP9315 24T1 ♢ Lectures Part H ♢ [90/92]
❖ Recovery in PostgreSQL

PostgreSQL uses write-ahead undo/redo style logging.

It also uses multi-version concurrency control, which

MVCC simplifies some aspects of undo/redo, e.g.
Recall: WAL entries in postgresql.conf
COMP9315 24T1 ♢ Lectures Part H ♢ [91/92]
❖ Recovery in PostgreSQL (cont)

Transaction/logging code is distributed throughout backend.

Core transaction code is in src/backend/access/transam.

Transaction/logging data is written to files in PGDATA/pg_xlog

COMP9315 24T1 ♢ Lectures Part H ♢ [92/92]


Produced: 30 Apr 2024