COMP3311 Week 9 Thursday Lecture

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [0/39]
❖ Week 09 Thursday

In today's lecture ...

Things to do ...

Things to note ...

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [1/39]
❖ Transactions (review)

Transaction (tx) = application unit of work involving multiple DB updates

Want tx's to be:  Atomic,  Consistent,  Isolated,  Durable

Description of tx's normally abstracted to R(X), W(X), C, A operations

Schedule = sequence of operations involving one or more tx's

Example serial schedule involving two transactions:

T1: R(X) W(X) R(Y) W(Y) R(Z) W(Z)
T2:                               R(Y) W(Y) R(X)

Example concurrent schedule involving same transactions:

T1: R(X)      W(X) R(Y)      W(Y)      R(Z) W(Z)
T2:      R(Y)           W(Y)      R(X)

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [2/39]
❖ Transactions (review) (cont)

All serial schedules (e.g. T1 then T2) are valid/safe

Not all concurrent schedules are valid (e.g. above example is not safe)

Serializable schedule = concurrent schedule equivalent to a serial schedule

Example serializable concurrent schedule:

T1: R(X) W(X)      R(Y)     W(Y)
T2:                              R(Y)       W(Y) R(X)
T3:           R(X)      W(X)           W(Z)

Two notions of serializability: conflict serializability, view serializability

DBMSs aim to control concurrency via locking and rollback

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [3/39]
❖ Conflict Serializability

A characterization of serializability based on conflicting operations

Consider two transactions T1 and T2 acting on data item X.

Possible orders for read/write operations by T1 and T2:

T1 first   T2 first   Equiv?
R1(X) R2(X)   R2(X) R1(X)   yes
R1(X) W2(X)   W2(X) R1(X)   no
W1(X) R2(X)   R2(X) W1(X)   no
W1(X) W2(X)   W2(X) W1(X)   no

Note: if T1 and T2 act on different data items (e.g. R(X), W(Y)), order does not matter.

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [4/39]
❖ Conflict Serializability (cont)

Two transactions have a potential conflict if

In such cases, the order of operations affects the result.

If no conflict, can swap order without affecting the result.

If we can transform a schedule

then we say that the schedule is conflict serializible.
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [5/39]
❖ Conflict Serializability (cont)


Example: transform a concurrent schedule to serial schedule

T1: R(A) W(A)      R(B)      W(B)
T2:           R(A)      W(A)      R(B) W(B)
swap
T1: R(A) W(A) R(B)           W(B)
T2:                R(A) W(A)      R(B) W(B)
swap
T1: R(A) W(A) R(B)      W(B)
T2:                R(A)      W(A) R(B) W(B)
swap
T1: R(A) W(A) R(B) W(B)
T2:                     R(A) W(A) R(B) W(B)

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [6/39]
❖ Conflict Serializability (cont)

Checking for conflict-serializability:

Method for doing this:
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [7/39]
❖ Example Precedence Graph


Example schedule which is not conflict serializable:

T1: R(X)           R(Y) W(X)      W(Y)
T2:           R(X)           W(X)
T3:      R(X)                          W(X)

Precendence graph for the above schedule:

[Diagram:Pics/xact/prec-graph.png]

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [8/39]
❖ Exercise: Precedence Graphs


Build precedence graphs for the following schedules:

  1. T1: R(X) W(X)      R(Y)      W(Y)
    T2:           R(X)      W(X)
    


  2. T1: R(X)      W(X)      R(Y) W(Y)
    T2:      R(X)      W(X)
    

For each schedule:
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [9/39]
❖ View Serializability

View Serializability is

As with CS, it is based on a notion of schedule equivalence The idea: if across the two schedules ... then they are view equivalent
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [10/39]
❖ View Serializability (cont)

Two schedules S and S' on T1 .. Tn are view equivalent iff

To check serializibilty of S ...
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [11/39]
❖ Exercise: Checking Serializability

Check whether each schedule is conflict/view serializable:


  1. T1: R(X)      W(X)
    T2:      R(X)      W(X)
    


  2. T1: W(X)      R(Y)
    T2:      R(Y)      R(X)
    


  3. T1: R(X)      W(X)
    T2:      W(X)
    T3:                W(X)
    

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [12/39]
❖ Concurrency Control

Serializability tests are useful theoretically ...

But don't provide a mechanism for organising schedules

What is required are methods that ...
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [13/39]
❖ Concurrency Control (cont)

Approaches to ensuring ACID transactions:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [14/39]
❖ Lock-based Concurrency Control

Synchronise access to shared data items via following rules:

These rules alone do not guarantee serializability ...
Locking also introduces potential for deadlock and starvation.
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [15/39]
❖ Lock-based Concurrency Control (cont)

Examples of locking:

Schedule 1
T1: Lx(X)       R(X)           W(X) U(X)
T2:       Lx(Y)      R(Y) W(Y)           U(Y)

Schedule 2
T1: Lx(X)       R(X) W(X) U(X)
T2:       Lx(X) .............. R(X) W(X) U(X)

New operations: Lx() = exclusive lock, Ls() = shared lock, U() unlock

Note: in Schedule 2, locking effectively forces serial execution

This is not generally the case; there may be some concurrency

In general, locking reduces concurrency, but gains safety

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [16/39]
❖ Concurrency Control in SQL


Transactions in SQL are specified by

In PostgreSQL, some actions cause implicit rollback:
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [17/39]
❖ Concurrency Control in SQL (cont)

Concurrent access can be controlled via SQL:

LOCK TABLE explicitly acquires lock on an entire table.

Some SQL commands implicitly acquire appropriate locks, e.g.

All locks are released at end of transaction (no explicit unlock)
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [18/39]
❖ Locking in PostgreSQL

Some locking in PG is implicit (e.g. changing schema)

Explicit locks are available:

lock table TableName [ in LockMode mode ]

Some possible LockModes:

No UNLOCK ... all locks are released at end of transaction

Row-level locking: lock all selected rows for writing

select * from Table where Condition for update

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [19/39]
❖ Locking in PostgreSQL (cont)

Two examples of lock usage in PostgreSQL:

begin;
lock table films in share mode;
select id into _id_ from films
    where name = 'Star Wars: Episode I - The Phantom Menace';
-- Do rollback if no record was returned
insert into films_user_comments values
    (_id_, 'GREAT! I was waiting for it for so long!');
commit;

begin;
lock table films in share row exclusive mode;
delete from films_user_comments where id in
    (select id from films where rating < 5);
delete from films where rating < 5;
commit;

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [20/39]
❖ Exercise: Locking


Apply appropriate locking in the bank transfer transaction.

create or replace function
   transfer(N integer, Src text, Dest text)
   returns integer
declare
   sID integer; dID integer; avail integer;
begin
   select id,balance into sID,avail
   from   Accounts where name=Src;
...
   return nextval('tx_id_seq');
end;

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [21/39]
❖ Locking and Performance

Locking reduces concurrency lower throughput.

Granularity of locking can impact performance:

+ lock a small item more of database accessible

+ lock a small item quick update quick lock release

- lock small items more locks more lock management

Granularity levels: field, row (tuple), table, whole database

Many DBMSs support multiple lock-granularities.

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [22/39]
❖ Multi-version Concurrency Control

One approach to reducing the requirement for locks is to

This approach is called Multi-Version Concurrency Control.

Differences between MVCC and standard locking models:


PostgreSQL pioneered MVCC as a concurrency control mechanism
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [23/39]
❖ Multi-version Concurrency Control (cont)

PostgreSQL MVCC ...

Access to a tuple requires Periodic vacuum process deletes tuples that Time/space overheads in implementing MVCC
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [24/39]
❖ Future of Database

Core "database" goals:

At the moment (and for the last 40 years) RDBMSs dominate ... RDBMSs work well in domains with uniform, structured data.
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [25/39]
❖ Future of Database (cont)

Limitations/pitfalls of RDBMSs:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [26/39]
❖ Future of Database (cont)

How to overcome (some of) these limitations?

Extend the relational model ...

Replace the relational model ... Performance: DBMSs that "tune" themselves ...
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [27/39]
❖ Big Data

Some modern applications have massive data sets (e.g. Google)

Approach to dealing with such data Often this data does not need full relational selection
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [28/39]
❖ Big Data (cont)

Popular computational approach to Big Data: map/reduce

Some Big Data proponents see no future need for SQL/relational ...
Humour: Parody of noSQL fans (strong language warning)
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [29/39]
❖ Object-relational Mapping

Pure OO databases came and went in the 90's  (similarly XML databases)

A compromise:

Programmer works purely with objects; wrapper converts to SQL

Problems:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [30/39]
❖ Information Retrieval

DBMSs generally do precise matching (although like/regexps)

Information retrieval systems do approximate matching.

E.g. documents containing these words (Google, etc.)

Also introduces notion of "quality" of matching
(e.g. tuple T1 is a better match than tuple T2)

Quality also implies ranking of results.

Much activity in incorporating IR ideas into DBMS context.

Goal: support database exploration better.

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [31/39]
❖ Multimedia Data

Data which does not fit the "tabular model":

Research problems: Solutions to the first problem typically: Sample query: find other songs like this one?
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [32/39]
❖ Uncertainty

Multimedia/IR introduces approximate matching.

In some contexts, we have approximate/uncertain data.

E.g. witness statements in a crime-fighting database

"I think the getaway car was red ... or maybe orange ..."

"I am 75% sure that John carried out the crime"

Work by Jennifer Widom at Stanford on the Trio system

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [33/39]
❖ Stream Management Systems

Makes one addition to the relational model

Applications: news feeds, telecomms, monitoring web usage, ...

RDBMSs: run a variety of queries on (relatively) fixed data

StreamDBs: run fixed queries on changing data (stream)

Approaches:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [34/39]
❖ Graph-based Databases

Uses graphs rather than tables as basic data structure tool.

Applications: complex data representation, via "flexible" objects

Implementing graphs in RDBMSs is possible, but often inefficient.

Graph nature of data changes query model considerably.

Research problem: query processing for large graphs

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [35/39]
❖ Dispersed Databases

Characteristics of dispersed databases:

Applications: environmental monitoring devices, "intelligent dust", ...

Research issues:

Less extreme versions of this already exist:
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [36/39]
❖ Main-memory Databases

Disk-based data is slow to access, and page-at-a-time

Non-volatile non-disk storage is becoming larger and cheaper

Research problems:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [37/39]
❖ Other DB Research

Open problems:

COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [38/39]
❖ Beyond COMP3311

COMP9312 Data Analyutics for Graphs

COMP9313 Big Data Management COMP9315 Database Systems Implementation COMP9319 Web Data Compression and Search COMP6714 Information Retrieval and Web Search
COMP3311 22T3 ♢ Week 9 Thursday Lecture ♢ [39/39]


Produced: 13 Apr 2023