DBMS Architecture and Implementation


DBMS Architecture and Implementation

Aims:


COMP3311: overview of the above; how to use them in app development
COMP9315: explore the above in detail; implement (bits of) a DBMS.


Database Application Performance

In order to make DB applications efficient, we need to know:

and then, as much as the DBMS allows, "encourage" it to use the most efficient methods.

Database Application Performance (cont)

Application programmer choices that affect query cost:


Database Application Performance (cont)

Whatever you do as a DB application programmer

Transformation is carried out by query optimiser You have no control over the optimisation process

Database Application Performance (cont)

Example: query to find sales people earning more than $50K

select name from Employee
where  salary > 50000 and
       empid in (select empid from Worksin
                 where  dept = 'Sales')

A query optimiser might use the strategy

SalesEmps = (select empid from WorksIn where dept='Sales')
foreach e in Employee {
    if (e.empid in SalesEmps && e.salary > 50000)
        add e to result set
}

Needs to examine all employees, even if not in Sales


Database Application Performance (cont)

A different expression of the same query:

select name
from   Employee join WorksIn using (empid)
where  Employee.salary > 5000 and
       WorksIn.dept = 'Sales'

Query optimiser might use the strategy

SalesEmps = (select * from WorksIn where dept='Sales')
foreach e in (Employee join SalesEmps) {
    if (e.salary > 50000)
        add e to result set
}

Only examines Sales employees, and uses a simpler test


Database Application Performance (cont)

A very poor expression of the query (correlated subquery):

select name from Employee e
where  salary > 50000 and
       'Sales' in (select dept from Worksin where empid=e.id)

A query optimiser would be forced to use the strategy:

foreach e in Employee {
    Depts = (select dept from WorksIn where empid=e.empid)
    if ('Sales' in Depts && e.salary > 50000)
        add e to result set
}

Needs to run a query for every employee ...


Database Transaction Processing

In most applications ...

Example: bank funds transfer requires at least two operations

Transfer(Amount, Source, Dest)
-- deduct funds from source account
update Account set balance = balance - Amount
where  account = Source
-- add funds to destination account
update Account set balance = balance + Amount
where  account = Dest

May also require checks on validity of Source and Dest ...


Database Transaction Processing (cont)

Where things can go wrong in the funds transfer example:


(1) update Account set balance = balance - Amount where account = Source
(2) update Account set balance = balance + Amount where account = Dest

  1. system could crash after first operation but before second
    • money lost from source account
    • solution requires us to ensure 0 or 2 ops are completed
  2. two users could do first operation simultaneously
    • only one deduction from source account
    • solution requires us to control "simultaneous" access

Database Transaction Processing (cont)

DBMSs provide two inter-related mechanisms for transactions ...


Database Transaction Processing (cont)

Locking is critical in many contexts

If need for locking can be reduced better throughput

Many DBMSs provide MVCC (multi-version concurrency control)

Disadvantages: storage overhead, each tuple access requires relevance check

Data Representation

A DBMS provides representations for:

Value representations use underlying data types and byte arrays Tuple representations are like structs with header info

Data Representation (cont)

Many ways to represent database relations:

Examples of the above:

Data Representation (cont)

PostgreSQL's data layout:


Data Representation (cont)

DB data is (obviously) persistent resides on disk

To maximise efficiency of disk transfers

To avoid disk reads/writes where possible, DBMSs typically have a very large in-memory cache of database pages.

DBMS Architecture

Layers in a DB Engine (Ramakrishnan's View)

[Diagram:Pic/dbms/dbmsarch.png]


DBMS Components

File manager manages allocation of disk space and data structures
Buffer manager manages data transfer between disk and main memory
Query optimiser translates queries into efficient sequence of relational ops
Recovery manager ensures consistent database state after system failures
Concurrency manager controls concurrent access to database
Integrity manager verifies integrity constraints and user privileges


Produced: 13 Sep 2020