Transactions and Concurrency


Transactions, Concurrency, Recovery

DBMSs provide access to valuable information resources in an environment that is:

Each user should see the system as: Goal: data integrity is maintained at all times.

Transactions, Concurrency, Recovery (cont)

Transaction processing

Concurrency control Recovery mechanisms

Transaction Processing

A transaction is a "logical unit of work" in a DB application.

Examples:

A transaction typically comprises multiple DBMS operations.

E.g.   select ... update ... insert ... select ... insert ...


Transaction Processing (cont)

Transaction processing (TP) systems can be viewed as highly dynamic database applications.

Common characteristics of transaction-processing systems:

 
TP benchmarks: important measure of DBMS performance.

Example Transaction

Problem: transfer funds between two accounts in same bank.

Possible implementation in PLpgSQL:


create or replace function
    transfer(src int, dest int, amount float) returns void
as $$
declare
    oldBalance float;
    newBalance float;
begin
    -- error checking
    select * from Accounts where id=src;
    if (not found) then
        raise exception 'Invalid Withdrawal Account';
    end if;
    select * from Accounts where id=dest;
    if (not found) then
        raise exception 'Invalid Deposit Account';
    end if;
...


Example Transaction (cont)


...
    -- action
(A) select balance into oldBalance
    from Accounts where id=src;
    if (oldBalance < amount) then
        raise exception 'Insufficient funds';
    end if;
    newBalance := oldBalance - amount;
(B) update Accounts
    set    balance := newBalance
    where  id = src;
    -- partial completion of transaction
(C) update Accounts
    set    balance := balance + amount
    where  id = dest;
    commit; -- redundant; function = transaction
end;
$$ language plpgsql;


Example Transaction (cont)

Consider two simultaneous transfers between accounts, e.g.

If the sequence of events is like:

T1:  ...  A  B  C  ...
T2:                ...  A  B  C  ...

everything works correctly, i.e.


Example Transaction (cont)

What if the sequence of events is like?

T1:  ...  A   B        C  ...
T2:    ...  A   B   C  ...

In terms of database operations, this is what happens:

Final balance of Y is ok; final balance of X is wrong.

Transaction Concepts

A transaction must always terminate, either:

[Diagram:Pic/xact/trans-states.png]


Transaction Concepts (cont)

To describe transaction effects, we consider:

 
SELECT produces READ operations on the database.

INSERT, UPDATE, DELETE produce WRITE/READ operations.


Transaction Concepts (cont)

The READ, WRITE, ABORT, COMMIT operations:

The operations are typically denoted as:

RT(X) read item X in transaction T
WT(X) write item X in transaction T
AT abort transaction T
CT commit transaction T


Transaction Concepts (cont)

Execution of the above funds transfer example can be described as

T: READ(S);  READ(D);  -- S = source tuple, D = dest tuple
   READ(S);  S.bal := S.bal-amount;  WRITE(S)
   READ(D);  D.bal := D.bal+amount;  WRITE(D)
   COMMIT;

or simply as

RT(S)   RT(D)   RT(S)   WT(S)   RT(D)   WT(D)   CT

 
This is known as a schedule for the transaction.


Transaction Consistency

Transactions typically have intermediate states that are inconsistent.

However, states before and after transaction must be consistent.  
 

[Diagram:Pic/xact/trans-consis.png]


ACID Properties

Data integrity is assured if transactions satisfy the following:

Atomicity

Consistency
Isolation
Durability

ACID Properties (cont)

Atomicity is handled by the commit and rollback mechanisms.

Durability is handled by implementing stable storage, via

Here, we consider primarily consistency and isolation.

Transaction Anomalies

If concurrent transactions access shared data objects, various anomalies can arise.

We give examples using the following two transactions:

T1: read(X)           T2: read(X)
    X := X + N            X := X + M
    write(X)              write(X)
    read(Y)
    Y := Y - N
    write(Y)

and initial DB state X=100, Y=50, N=5, M=8.


Serial Schedules

Serial execution means no overlap of transaction operations.

If T1 and T2 transactions are executed serially:

T1: R(X) W(X) R(Y) W(Y)
T2:                     R(X) W(X)

or

T1:           R(X) W(X) R(Y) W(Y)
T2: R(X) W(X)

the database is left in a consistent state.


Serial Schedules (cont)

The basic idea behind serial schedules:

As would occur e.g. in a single-user database system.

Serial Schedules (cont)

For the first schedule in our example:

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
                                read(X)   105
                                X:=X+M    113
113                             write(X)
---------
113  45


Serial Schedules (cont)

For the second schedule in our example:

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
                                read(X)   100
                                X:=X+M    108
108                             write(X)
           read(X)   108
           X:=X+N    113
113        write(X)
           read(Y)        50
           Y:=Y-N         45
     45    write(Y)
---------
113  45


Serial Schedules (cont)

Note that serial execution doesn't mean that each transaction will get the same results, regardless of the order.

Consider the following two transactions:

T1: select sum(salary)
    from Employee where dept='Sales'

T2: insert into Employee
    values (....,'Sales',...)

If we execute T1 then T2, we get a smaller salary total than if we execute T2 then T1.

In both cases, however, the salary total is consistent with the state of the database at the time the query is executed.


Concurrent Schedules

A serial execution of consistent transactions is always consistent.

If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.

In the worst case, the effect of executing the transactions ...

So why don't we observe such problems in real DBMSs? ...

Valid Concurrent Transaction

Not all concurrent executions cause problems.

For example, the schedules

T1: R(X) W(X)           R(Y) W(Y)
T2:           R(X) W(X)

or

T1: R(X) W(X)      R(Y)      W(Y)
T2:           R(X)      W(X)

or ...

leave the database in a consistent state.


Lost Update Problem

Consider the following schedule where the transactions execute in parallel:

T1: R(X)      W(X) R(Y)      W(Y)
T2:      R(X)           W(X)

In this scenario:

This is called a Write-Read (WR) Conflict or dirty read.

The result: T1's update to X is lost.


Lost Update Problem (cont)

Consider the states in the WR Conflict schedule:

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
                                read(X)   100
                                X:=X+M    108
105        write(X)
           read(Y)        50
108                             write(X)
           Y:=Y-N         45
     45    write(Y)
---------
108  45


Temporary Update Problem

Consider the following schedule where one transaction fails:

T1: R(X) W(X) A
T2:             R(X) W(X)

Transaction T1 aborts after writing X.

The abort will undo the changes to X, but where the undo occurs can affect the results.

Consider three places where undo might occur:

T1: R(X) W(X) A [1]     [2]     [3]
T2:                 R(X)    W(X)


Temporary Update - Case 1

This scenario is ok.   T1's effects have been eliminated.

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           abort
100        undo
                                read(X)   100
                                X:=X+M    108
108                             write(X)
---------
108  50


Temporary Update - Case 2

In this scenario, some of T1's effects have been retained.

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           abort
                                read(X)   105
                                X:=X+M    113
100        undo
113                             write(X)
---------
113  50


Temporary Update - Case 3

In this scenario, T2's effects have been lost, even after commit.

Database   T1                   T2
---------  -------------------  --------------
X    Y               X    Y               X
100  50              ?    ?               ?
           read(X)   100
           X:=X+N    105
105        write(X)
           abort
                                read(X)   105
                                X:=X+M    113
113                             write(X)
100        undo
---------
100  50


Valid Schedules

For ACID, we must ensure that schedules are:


Serializability

If a concurrent schedule for transactions T1 ..Tn acts like a serial schedule for T1 ..Tn, then consistency is guaranteed.

To determine this requires a notion of schedule equivalence.

Note: we are not attempting to determine equivalence of entire computations, simply of the interleaved sequences of read/write operations.

A serializable schedule is a concurrent schedule that produces a final state that is the same as that produced by some serial schedule.

There are two primary formulations of serializability:


Conflict Serializability

Consider two transactions T1 and T2 acting on data item X.

Considering only read/write operations, the possibilities are:

T1 first   T2 first   Equiv?
R1(X) R2(X)   R2(X) R1(X)   yes
R1(X) W2(X)   W2(X) R1(X)   no
W1(X) R2(X)   R2(X) W1(X)   no
W1(X) W2(X)   W2(X) W1(X)   no

If T1 and T2 act on different data items, result is equivalent regardless of order.


Conflict Serializability (cont)

Two transactions have a potential conflict if

In such cases, the order of operations affects the result.

Conversely, if two operations in a schedule don't conflict,
we can swap their order without affecting the overall result.

This gives a basis for determining equivalence of schedules.


Conflict Serializability (cont)

If we can transform a schedule

then we say that the schedule is conflict serializible.

If a concurrent schedule is equivalent to some (any) serial schedule, then we have a consistency guarantee.


Conflict Serializability (cont)

Example: transform a concurrent schedule to serial schedule

T1: R(A) W(A)      R(B)      W(B)
T2:           R(A)      W(A)      R(B) W(B)
swap
T1: R(A) W(A) R(B)           W(B)
T2:                R(A) W(A)      R(B) W(B)
swap
T1: R(A) W(A) R(B)      W(B)
T2:                R(A)      W(A) R(B) W(B)
swap
T1: R(A) W(A) R(B) W(B)
T2:                     R(A) W(A) R(B) W(B)


View Serializability

View Serializability is

As with CS, it is based on a notion of schedule equivalence The idea: if all the read operations in two schedules ...

View Serializability (cont)

Two schedules S and S' on T1 .. Tn are view equivalent iff

To check serializibilty of S, find a serial schedule that is view equivalent to S

Testing Serializability

In practice, we don't test specific schedules for serializability.

However, in designing concurrency control schemes, we need a way of checking whether they produce "safe" schedules.

This is typically achieved by a demonstration that the scheme generates only serializable schedules, and we need a serializability test for this.

There is a simple and efficient test for conflict serializability;
there is a more complex test for view serializablity.

Both tests are based on notions of


Testing Serializability (cont)

A precedence graph G = (V,E) for a schedule S consists of

Note: the edge is directed from Tj   →   Tk

Testing Serializability (cont)

If an edge Tj → Tk exists in the precedence graph

Implication: if the precedence graph has cycles, then S can't be serialized.

Thus, the serializability test is reduced to cycle-detection

(and there are cycle-detection algorithms available in many algorithms textbooks)


Serializability Test Examples

Serializable schedule   (with conflicting operations shown in red):

T1: R(A) W(A)      R(B)      W(B)
T2:           R(A)      W(A)      R(B) W(B)

Precedence graph for this schedule:

[Diagram:Pic/xact/pg1.png]

No cycles serializable  (as we already knew)


Serializability Test Examples (cont)

Consider this schedule:

T1: R(A)                W(A) R(B) W(B)
T2:      R(A) W(A) R(B)                W(B)

Precedence graph for this schedule:

[Diagram:Pic/xact/pg2.png]

Has a cycle not serializable


Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: R(A)R(C)W(A)    W(C)
T2:             R(B)    R(A)    W(B)        W(A)
T3:                         R(C)    R(B)W(C)    W(B)

Precedence graph for this schedule:

[Diagram:Pic/xact/pg3.png]

No cycles serializable


Serializability Test Examples (cont)

Consider this 3-transaction schedule:

T1: R(A)                W(A)        R(C)    W(C)
T2:     R(B)    W(B)            R(A)    W(A)
T3:         R(C)    W(C)    R(B)                W(B)

Precedence graph for this schedule:

[Diagram:Pic/xact/pg4.png]

Has a cycle not serializable


Concurrency Control

Having serializability tests is useful theoretically, but they do not provide a practical tool for organising schedules.

Why not practical?

What is required are methods

Concurrency Control (cont)

Approaches to ensuring ACID transactions:


Lock-based Concurrency Control

Synchronise access to shared data items via following rules:

These rules alone do not guarantee serializability.

Two-Phase Locking

To guarantee serializability, we require an additional constraint on how locks are applied:

Each transaction is then structured as:

Problems with Locking

Appropriate locking can guarantee correctness.

However, it also introduces potential undesirable effects:


Deadlock

Deadlock occurs when two transactions are waiting for a lock on an item held by the other.

Example:

T1              T2
--------------  ---------------
write_lock(X)
read(X)
                write_lock(Y)
                read(Y)
write_lock(Y)
waiting for Y   write_lock(X)
waiting for Y   waiting for X


Deadlock (cont)

Handling deadlock involves forcing a transaction to "back off".


Locking and Starvation

Starvation occurs when one transaction

Whether it occurs depends on the lock wait/release strategy.

Multiple locks need to decide which to release first.

Solutions:


Locking and Performance

Locking typically reduces concurrency reduces throughput.

Granularity of locking can impact performance:

+ lock a small item more of database accessible

+ lock a small item quick update quick lock release

- lock small items more locks more lock management

Granularity levels: field, row (tuple), table, whole database

Multiple lock-granularities give best scope for optimising performance.


Multi-version Concurrency Control

One approach to reducing the requirement for locks is to

This approach is called multi-version concurrency control (MVCC).

The primary difference between MVCC and standard locking models:


MVCC and Transactions

Database systems using MVCC ensure

With this behaviour:

PostgreSQL and MVCC

PostgreSQL uses MVCC to reduce locking requirements.

Consequences:

An "old" tuple is one that is no longer visible to any transaction.

Concurrency control is still needed (via implicit locking):


PostgreSQL and MVCC (cont)

A transaction sees a consistent view of the database, but it may not see the "current" view of the database.

E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples

This is not a problem unless the transactions communicate outside the database system.

For applications that require that every transaction accesses the current consistent version of the data, explicit locking must be used.


Concurrency Control in SQL

Transactions in SQL are specified by

In PostgreSQL, other actions that cause rollback:

Concurrency Control in SQL (cont)

More fine-grained control of "undo" via savepoints:

Example:

begin;
  insert into numbersTable values (1);
  savepoint my_savepoint;
  insert into numbersTable values (2);
  rollback to savepoint my_savepoint;
  insert into numbersTable values (3);
commit;

will insert 1 and 3 into the table, but not 2.


Concurrency Control in SQL (cont)

SQL standard defines four levels of transaction isolation.

The weakest level allows dirty reads, phantom reads, etc.

PostgreSQL implements: repeatable-read = serializable, read-uncommitted = read-committed


Concurrency Control in SQL (cont)

Using the serializable isolation level, a select:

Using the serializable isolation level, an update fails: The transaction containing the failed update will rollback and re-start.

Concurrency Control in SQL (cont)

Explicit control of concurrent access is available, e.g.

Table-level locking: LOCK TABLE

Row-level locking: SELECT FOR UPDATE, UPDATE, DELETE

All locks are released at end of transaction (no explicit unlock)

PostgreSQL, Transactions, Concurrency

For more details on PostgreSQL's handling of these:


Produced: 13 Sep 2020