COMP9315 Week 10 Monday Lecture

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [0/63]
❖ Things To Note


COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [1/63]
❖ GDB in one slide

To debug the program:

$ ./select R '?,?,shoes'

Run GDB as follows

$ gdb ./select
...
(gdb) run R '?,?,shoes'
... program crashes ...
(gdb) where
... look at the line numbers and function arguments ...
(gdb) print pid
... examine values of variables in current scope ...
(gdb) quit

Has many other functions, e.g. break   (program stops at a given point)

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [2/63]
❖ Transactions (reminder)

A transaction is

Transactions deal with three levels of storage Transactions ...
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [3/63]
❖ Exercise: Transaction Internals

Consider the following transaction:

-- implements A = A*2; B = B+1;
BEGIN
READ(A,v); v = v*2; WRITE(A,v);
READ(B,v); v = v+1; WRITE(B,v);
COMMIT

Show how the following items change values after each statement

What is the final state after the COMMIT completes?
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [4/63]
❖ Exercise: Transaction Failure

Consider the previous transaction.

(0) BEGIN
(1) READ(A,v);
(2) v = v*2;
(3) WRITE(A,v);
(4) READ(B,v);
(5) v = v+1;
(6) WRITE(B,v);
(7) COMMIT
(8) OUTPUT(A)
(9) OUTPUT(B)

What happens if ...

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [5/63]
❖ Logging

Three "styles" of logging

All approaches require: Known as write-ahead logging    (PostgreSQL uses WAL)
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [6/63]
❖ Undo Logging

Simple form of logging which ensures atomicity.

Log file consists of a sequence of small records:

Notes:
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [7/63]
❖ Undo Logging (cont)

Data must be written to disk in the following order:

  1. <START> transaction log record
  2. <UPDATE> log records indicating changes
  3. the changed data elements themselves
  4. <COMMIT> log record
Note: sufficient to have <T,X,v> output before X, for each X
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [8/63]
❖ Undo Logging (cont)

For the example transaction, we would get:

t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6
(11) EndCommit                                <COMMIT T>
(12) FlushLog

Note that T is not regarded as committed until (12) completes.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [9/63]
❖ UNDO Logging

❖ UNDO Logging (cont)

Simplified view of recovery using UNDO logging:

Assumes we scan entire log; use checkpoints to limit scan.
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [10/63]
❖ UNDO Logging (cont)

Algorithmic view of recovery using UNDO logging:

committedTrans = abortedTrans = startedTrans = {}
for each log record from most recent to oldest {
    switch (log record) {
    <COMMIT T> : add T to committedTrans
    <ABORT T>  : add T to abortedTrans
    <START T>  : add T to startedTrans
    <T,X,v>    : if (T in committedTrans)
                     // don't undo committed changes
                 else  // roll-back changes
                     { WRITE(X,v); OUTPUT(X) }
}   }
for each T in startedTrans {
    if (T in committedTrans) ignore
    else if (T in abortedTrans) ignore
    else write <ABORT T> to log
}
flush log

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [11/63]
❖ Exercise: Recovery with UNDO Log


Show how the UNDO log would be used if the previous tx

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [12/63]
❖ Checkpointing

Simple view of recovery implies reading entire log file.

Since log file grows without bound, this is infeasible.

Eventually we can delete "old" section of log.

This point is called a checkpoint. As described we need to wait for all active tx to complete
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [13/63]
❖ Checkpointing (cont)

Problem: many concurrent/overlapping transactions.

How to know that all have finished?

  1. periodically, write log record <CHKPT (T1,..,Tk)>
    (contains references to all active transactions active tx table)
  2. continue normal processing (e.g. new tx's can start)
  3. when all of T1,..,Tk have completed,
    write log record <END CHKPT> and flush log
Note: tx manager maintains chkpt and active tx information

These kinds of checkpoints are often written as <START CHKPT ...>

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [14/63]
❖ Checkpointing (cont)

Recovery: scan backwards through log file processing as before.

Determining where to stop depends on ...

If we encounter <END CHKPT> first:

If we encounter <CHKPT (T1,..,Tk)> first:
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [15/63]
❖ Redo Logging

Problem with UNDO logging:

Alternative approach is redo logging:
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [16/63]
❖ Redo Logging (cont)

Requirement for redo logging: write-ahead rule.

Data must be written to disk as follows:

  1. write start transaction <START T> log record
  2. write <UPDATE...> log records indicating changes
  3. then write <COMMIT> log record (and flush log)
  4. then OUTPUT changed data elements themselves
Note that update log records now contain <T,X,v'>,
where v' is the new value for X.
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [17/63]
❖ Redo Logging (cont)

For the example transaction, we would get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,6>
(7)  COMMIT                                   <COMMIT T>
(8)  FlushLog
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (8) completes.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [18/63]
❖ Redo Logging (cont)

Simplified view of recovery using REDO logging:

Use checkpoints (like UNDO log checkpoints) to limit scan.
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [19/63]
❖ Undo/Redo Logging

UNDO logging and REDO logging are incompatible in

Undo/Redo logging combines aspects of both As for previous cases, requires write-ahead of log records.

Undo/redo loging is common in practice; Aries algorithm.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [20/63]
❖ Undo/Redo Logging (cont)

For the example transaction, we might get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5,6>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10)                                          <COMMIT T>
(11) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (10) completes.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [21/63]
❖ Undo/Redo Logging (cont)

Simplified view of recovery using UNDO/REDO logging:


Note: undo/redo logging requires dirty buffers to be flushed at <CHKPT...>
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [22/63]
❖ Undo/Redo Logging (cont)

The above description simplifies details of undo/redo logging.

Aries is a complete algorithm for undo/redo logging.

Differences to what we have described:

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [23/63]
❖ Recovery in PostgreSQL

PostgreSQL uses write-ahead undo/redo style logging.

It also uses multi-version concurrency control, which

MVCC simplifies some aspects of undo/redo, e.g.
Recall: WAL entries in postgresql.conf
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [24/63]
❖ Recovery in PostgreSQL (cont)

Transaction/logging code is distributed throughout backend.

Core transaction code is in src/backend/access/transam.

Transaction/logging data is written to files in PGDATA/pg_xlog

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [25/63]
❖ Database Trends (overview)

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [26/63]
❖ Future of Database

Core "database" goals:

At the moment (and for the last 30 years) RDBMSs dominate ... RDBMSs work well in domains with uniform, structured data.
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [27/63]
❖ Future of Database (cont)

Limitations/pitfalls of classical RDBMSs:

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [28/63]
❖ Future of Database (cont)

How to overcome (some) RDBMS limitations?

Extend the relational model ...

Replace the relational model ...
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [29/63]
❖ Future of Database (cont)

How to overcome (some) RDBMS limitations?

Performance ...

Scalability ...
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [30/63]
❖ Future of Database (cont)

An overview of the possibilities:

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [31/63]
❖ Future of Database (cont)

Historical perspective

[Diagram:Pics/future/dbms-history.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [32/63]
❖ Large Data

Some modern applications have massive data sets (e.g. Google)

Approach to dealing with such data Often this data does not need full relational selection
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [33/63]
❖ Large Data (cont)

Popular computational approach to such data: map/reduce

Some large data proponents see no future need for SQL/relational ... Humour: Parody of noSQL fans   (strong language warning)
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [34/63]
❖ Information Retrieval

DBMSs generally do precise matching (although like/regexps)

Information retrieval systems do approximate matching.

E.g. documents containing a set of keywords (Google, etc.)

Also introduces notion of "quality" of matching
(e.g. tuple T1 is a better match than tuple T2)

Quality also implies ranking of results.


Ongoing research in incorporating IR ideas into DBMS context.

Goal: support database exploration better.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [35/63]
❖ Multimedia Data

Data which does not fit the "tabular model":

Research problems: Solutions to the first problem typically: Sample query: find other songs like this one?
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [36/63]
❖ Uncertainty

Multimedia/IR introduces approximate matching.

In some contexts, we have approximate/uncertain data.

E.g. witness statements in a crime-fighting database

"I think the getaway car was red ... or maybe orange ..."

"I am 75% sure that John carried out the crime"

Work by Jennifer Widom at Stanford on the Trio system

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [37/63]
❖ Stream Data Management Systems

Makes one addition to the relational model

Applications: news feeds, telecomms, monitoring web usage, ...

RDBMSs: run a variety of queries on (relatively) fixed data

StreamDBs: run fixed queries on changing data (stream)

One approach: window = "relation" formed from a stream via a rule

E.g. StreamSQL

select avg(price)
from examplestream [size 10 advance 1 tuples]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [38/63]
❖ Graph Data

Uses graphs rather than tables as basic data structure tool.

Applications: social networks, ecommerce purchases, interests, ...

Many real-world problems are modelled naturally by graphs

Graph data models:  flexible,  "schema-free",  inter-linked

Typical modeling formalisms:  XML,  JSON,  RDF

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [39/63]
❖ Dispersed Databases

Characteristics of dispersed databases:

Applications: environmental monitoring devices, "intelligent dust", ...

Research issues:

Less extreme versions of this already exist:
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [40/63]
❖ Parallelism in Databases

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [41/63]
❖ Parallel DBMSs

RDBMS discussion so far has revolved around systems

[Diagram:Pics/parallel/one-processor.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [42/63]
❖ Parallel DBMSs (cont)

Why parallelism? ... Throughput!

[Diagram:Pics/parallel/para-thru.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [43/63]
❖ Parallel DBMSs (cont)

DBMSs are a success story in application of parallelism

Compare this with effort to do parallel programming.
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [44/63]
❖ Parallel Architectures

Types:   shared memory,   shared disk,   shared nothing

Example shared-nothing architecture:

[Diagram:Pics/parallel/shared-nothing.png]

Typically same room/LAN   (data transfer cost ~ 100's of μsecs .. msecs)

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [45/63]
❖ Distributed Architectures

Distributed architectures are ...

[Diagram:Pics/parallel/distrib-arch.png]

Typically on the Internet   (data transfer cost ~ secs)

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [46/63]
❖ Parallel Databases (PDBs)

Parallel databases provide various forms of parallelism ...

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [47/63]
❖ Parallel Databases (PDBs) (cont)

Types of parallelism

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [48/63]
❖ Data Storage in PDBs

Assume that each table/relation consists of pages in a file

Can distribute data across multiple storage devices

Replication example:

[Diagram:Pics/parallel/replication.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [49/63]
❖ Data Storage in PDBs (cont)

Data-partitioning example:

[Diagram:Pics/parallel/data-partition.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [50/63]
❖ Data Storage in PDBs (cont)

A table is a collection of pages (aka blocks).

Page addressing on single processor/disk: (Table, File, Page)

If multiple nodes, then addressing depends how data distributed
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [51/63]
❖ Data Storage in PDBs (cont)

Assume that partitioning is based on one attribute

Data-partitioning strategies for one table on n nodes:


Round-robin partitioning
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [52/63]
❖ Data Storage in PDBs (cont)

Hash partitioning

Range partitioning In both cases, data skew may lead to unbalanced load
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [53/63]
❖ Parallelism in DB Operations

Different types of parallelism in DBMS operations

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [54/63]
❖ Parallelism in DB Operations (cont)

Parallel scanning

Effectiveness depends on query type vs partitioning type

[Diagram:Pics/parallel/select-partition.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [55/63]
❖ Parallelism in DB Operations (cont)

Parallel sorting


Potential problem:
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [56/63]
❖ Parallelism in DB Operations (cont)

Parallel sort:


[Diagram:Pics/parallel/parallel-sort.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [57/63]
❖ Parallelism in DB Operations (cont)

Parallel nested loop join


Parallel sort-merge join
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [58/63]
❖ Parallelism in DB Operations (cont)

Parallel hash join

Fragment-and-replicate join
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [59/63]
❖ Parallelism in DB Operations (cont)

Parallel hash join:


[Diagram:Pics/parallel/hash-join.png]

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [60/63]
❖ PostgreSQL and Parallelism

PostgreSQL assumes

PostgreSQL allows So could run on ...
COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [61/63]
❖ PostgreSQL and Parallelism (cont)

PostgreSQL can provide

Both need data synchronisation between servers

PostgreSQL uses notion of master and slave servers.

COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [62/63]
❖ PostgreSQL and Parallelism (cont)

High availability ...

  • updates occur on master, recorded in tx log
  • tx logs shipped/streamed from master to slave(s)
  • slave uses tx logs to maintain current state
  • configuration controls frequency of log shipping
  • bringing slave up-to-date is "fast" (~1-2secs)

    Note: small window exists for data loss (committed tx log records not sent)

    Load balancing ...

  • COMP9315 24T1 ♢ Week 10 Monday Lecture ♢ [63/63]


    Produced: 15 Apr 2024