DBMSs provide access to valuable information resources in an environment that is:
Transaction processing
A transaction is a "logical unit of work" in a DB application.
Examples:
E.g. select ... update ... insert ... select ... insert ...
Transaction processing (TP) systems can be viewed as highly dynamic database applications.
Common characteristics of transaction-processing systems:
Problem: transfer funds between two accounts in same bank.
Possible implementation in PLpgSQL:
create or replace function
transfer(src int, dest int, amount float) returns void
as $$
declare
oldBalance float;
newBalance float;
begin
-- error checking
select * from Accounts where id=src;
if (not found) then
raise exception 'Invalid Withdrawal Account';
end if;
select * from Accounts where id=dest;
if (not found) then
raise exception 'Invalid Deposit Account';
end if;
...
... -- action (A) select balance into oldBalance from Accounts where id=src; if (oldBalance < amount) then raise exception 'Insufficient funds'; end if; newBalance := oldBalance - amount; (B) update Accounts set balance := newBalance where id = src; -- partial completion of transaction (C) update Accounts set balance := balance + amount where id = dest; commit; -- redundant; function = transaction end; $$ language plpgsql;
Consider two simultaneous transfers between accounts, e.g.
If the sequence of events is like:
T1: ... A B C ... T2: ... A B C ...
everything works correctly, i.e.
What if the sequence of events is like?
T1: ... A B C ... T2: ... A B C ...
In terms of database operations, this is what happens:
A transaction must always terminate, either:
COMMIT
ABORT
To describe transaction effects, we consider:
READ
WRITE
ABORT
COMMIT
SELECT
READ
INSERT
UPDATE
DELETE
WRITE
READ
The READ
WRITE
ABORT
COMMIT
read item X in transaction T | ||
write item X in transaction T | ||
abort transaction T | ||
commit transaction T |
Execution of the above funds transfer example can be described as
T: READ(S); READ(D); -- S = source tuple, D = dest tuple
READ(S); S.bal := S.bal-amount; WRITE(S)
READ(D); D.bal := D.bal+amount; WRITE(D)
COMMIT;
or simply as
This is known as a schedule for the transaction.
Transactions typically have intermediate states that are inconsistent.
However, states before and after transaction
must be consistent.
Data integrity is assured if transactions satisfy the following:
Atomicity
Consistency
Isolation
Durability
Atomicity is handled by the commit and rollback mechanisms.
Durability is handled by implementing stable storage, via
If concurrent transactions access shared data objects, various anomalies can arise.
We give examples using the following two transactions:
T1: read(X) T2: read(X) X := X + N X := X + M write(X) write(X) read(Y) Y := Y - N write(Y)
and initial DB state X=100
Y=50
N=5
M=8
Serial execution means no overlap of transaction operations.
If T1
T2
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
the database is left in a consistent state.
The basic idea behind serial schedules:
For the first schedule in our example:
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45
For the second schedule in our example:
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+M 108 108 write(X) read(X) 108 X:=X+N 113 113 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) --------- 113 45
Note that serial execution doesn't mean that each transaction will get the same results, regardless of the order.
Consider the following two transactions:
T1: select sum(salary) from Employee where dept='Sales' T2: insert into Employee values (....,'Sales',...)
If we execute T1
T2
T2
T1
In both cases, however, the salary total is consistent with the state of the database at the time the query is executed.
A serial execution of consistent transactions is always consistent.
If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.
In the worst case, the effect of executing the transactions ...
Not all concurrent executions cause problems.
For example, the schedules
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
or ...
leave the database in a consistent state.
Consider the following schedule where the transactions execute in parallel:
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)
In this scenario:
T2
X
T1
X
T1
The result: T1
X
Consider the states in the WR Conflict schedule:
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 read(X) 100 X:=X+M 108 105 write(X) read(Y) 50 108 write(X) Y:=Y-N 45 45 write(Y) --------- 108 45
Consider the following schedule where one transaction fails:
T1: R(X) W(X) A T2: R(X) W(X)
Transaction T1
X
The abort will undo the changes to X
Consider three places where undo might occur:
T1: R(X) W(X) A [1] [2] [3] T2: R(X) W(X)
This scenario is ok. T1
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort 100 undo read(X) 100 X:=X+M 108 108 write(X) --------- 108 50
In this scenario, some of T1
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort read(X) 105 X:=X+M 113 100 undo 113 write(X) --------- 113 50
In this scenario, T2
Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) abort read(X) 105 X:=X+M 113 113 write(X) 100 undo --------- 100 50
For ACID, we must ensure that schedules are:
The effect of executing n concurrent transactions is the same as the effect of executing them serially in some order.
For assessing the correctness of concurrency control methods, need a test to check whether it produces serializable schedules.
A failed transaction should not affect the ability to recover the system to a consistent state.
This can be ensured if transactions commit only after all transactions whose changes they read commit.
If a concurrent schedule for transactions T1 ..Tn acts like a serial schedule for T1 ..Tn, then consistency is guaranteed.
To determine this requires a notion of schedule equivalence.
Note: we are not attempting to determine equivalence of entire computations, simply of the interleaved sequences of read/write operations.
A serializable schedule is a concurrent schedule that produces a final state that is the same as that produced by some serial schedule.
There are two primary formulations of serializability:
Consider two transactions T1 and T2 acting on data item X.
Considering only read/write operations, the possibilities are:
T2 first | Equiv? | |||||||
R2(X) R1(X) | yes | |||||||
W2(X) R1(X) | no | |||||||
R2(X) W1(X) | no | |||||||
W2(X) W1(X) | no |
If T1 and T2 act on different data items, result is equivalent regardless of order.
Two transactions have a potential conflict if
Conversely, if two operations in a schedule don't conflict,
we can swap their order without affecting the overall result.
This gives a basis for determining equivalence of schedules.
If we can transform a schedule
If a concurrent schedule is equivalent to some (any) serial schedule, then we have a consistency guarantee.
Example: transform a concurrent schedule to serial schedule
T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B) swap T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B)
View Serializability is
Two schedules S and S' on T1 .. Tn are view equivalent iff
In practice, we don't test specific schedules for serializability.
However, in designing concurrency control schemes, we need a way of checking whether they produce "safe" schedules.
This is typically achieved by a demonstration that the scheme generates only serializable schedules, and we need a serializability test for this.
There is a simple and efficient test for conflict serializability;
there is a more complex test for view serializablity.
Both tests are based on notions of
A precedence graph G = (V,E) for a schedule S consists of
If an edge Tj → Tk exists in the precedence graph
Thus, the serializability test is reduced to cycle-detection
(and there are cycle-detection algorithms available in many algorithms textbooks)
Serializable schedule (with conflicting operations shown in red):
T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B)
Precedence graph for this schedule:
No cycles ⇒ serializable (as we already knew)
Consider this schedule:
T1: R(A) W(A) R(B) W(B) T2: R(A) W(A) R(B) W(B)
Precedence graph for this schedule:
Has a cycle ⇒ not serializable
Consider this 3-transaction schedule:
T1: R(A)R(C)W(A) W(C) T2: R(B) R(A) W(B) W(A) T3: R(C) R(B)W(C) W(B)
Precedence graph for this schedule:
No cycles ⇒ serializable
Consider this 3-transaction schedule:
T1: R(A) W(A) R(C) W(C) T2: R(B) W(B) R(A) W(A) T3: R(C) W(C) R(B) W(B)
Precedence graph for this schedule:
Has a cycle ⇒ not serializable
Having serializability tests is useful theoretically, but they do not provide a practical tool for organising schedules.
Why not practical?
Approaches to ensuring ACID transactions:
Synchronise transaction execution via locks on some portion of the database.
Allow multiple consistent versions of the data to exist, and allow each transaction exclusive access to one version.
Organise transaction execution in advance by assigning timestamps to operations.
Exploit typical execution-sequence properties of transactions to determine safety dynamically.
Synchronise access to shared data items via following rules:
To guarantee serializability, we require an additional constraint on how locks are applied:
Appropriate locking can guarantee correctness.
However, it also introduces potential undesirable effects:
No transactions can proceed; each waiting on lock held by another.
One transaction is permanently "frozen out" of access to data.
Locking introduces delays while waiting for locks to be released.
Deadlock occurs when two transactions are waiting for a lock on an item held by the other.
Example:
T1 T2 -------------- --------------- write_lock(X) read(X) write_lock(Y) read(Y) write_lock(Y) waiting for Y write_lock(X) waiting for Y waiting for X
Handling deadlock involves forcing a transaction to "back off".
Starvation occurs when one transaction
Multiple locks ⇒ need to decide which to release first.
Solutions:
Locking typically reduces concurrency ⇒ reduces throughput.
Granularity of locking can impact performance:
+ lock a small item ⇒ more of database accessible
+ lock a small item ⇒ quick update ⇒ quick lock release
- lock small items ⇒ more locks ⇒ more lock management
Granularity levels: field, row (tuple), table, whole database
Multiple lock-granularities give best scope for optimising performance.
One approach to reducing the requirement for locks is to
The primary difference between MVCC and standard locking models:
Database systems using MVCC ensure
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples
PostgreSQL uses MVCC to reduce locking requirements.
Consequences:
vacuum
Concurrency control is still needed (via implicit locking):
A transaction sees a consistent view of the database, but it may not see the "current" view of the database.
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples
This is not a problem unless the transactions communicate outside the database system.
For applications that require that every transaction accesses the current consistent version of the data, explicit locking must be used.
Transactions in SQL are specified by
BEGIN
COMMIT
ROLLBACK
More fine-grained control of "undo" via savepoints:
SAVEPOINT
ROLLBACK TO SAVEPOINT
begin; insert into numbersTable values (1); savepoint my_savepoint; insert into numbersTable values (2); rollback to savepoint my_savepoint; insert into numbersTable values (3); commit;
will insert 1 and 3 into the table, but not 2.
SQL standard defines four levels of transaction isolation.
PostgreSQL implements: repeatable-read = serializable, read-uncommitted = read-committed
Using the serializable isolation level, a select
Explicit control of concurrent access is available, e.g.
Table-level locking: LOCK TABLE
ALTER TABLE
Row-level locking: SELECT FOR UPDATE
UPDATE
DELETE
For more details on PostgreSQL's handling of these:
Produced: 13 Sep 2020