A: DBMS Architecture, PostgreSQL

COMP9315 24T1 ♢ Lectures Part A ♢ [0/55]
❖ Relational DBMS Functionality

Relational DBMSs provide a variety of functionalities:

Common feature of all relational DBMSs: relational model, SQL.
COMP9315 24T1 ♢ Lectures Part A ♢ [1/55]
❖ Data Definition

Relational data: relations/tables, tuples, values, types, e.g.


create domain WAMvalue float
   check (value between 0.0 and 100.0);

create table Students (
   id          integer,  -- e.g. 3123456
   familyName  text,     -- e.g. 'Smith'
   givenName   text,     -- e.g. 'John'
   birthDate   date,     -- e.g. '1-Mar-1984'
   wam         WAMvalue, -- e.g. 85.4
   primary key (id)
);

The above adds meta-data to the database.

DBMSs typically store meta-data as special tables (catalog).

COMP9315 24T1 ♢ Lectures Part A ♢ [2/55]
❖ Data Definition (cont)

Input: DDL statement   (e.g. create table)

[Diagram:Pics/intro/ddl-stat.png]

Result: meta-data in catalog is modified

COMP9315 24T1 ♢ Lectures Part A ♢ [3/55]
❖ Data Modification

Critical function of DBMS: changing data

E.g.

insert into Enrolments(student,course,mark)
values (3312345, 5542, 75);

update Enrolments set mark = 77
where  student = 3354321 and course = 5542;

delete Enrolments where student = 3112233;

COMP9315 24T1 ♢ Lectures Part A ♢ [4/55]
❖ Data Modification (cont)

Input: DML statements

[Diagram:Pics/intro/dml-stat.png]

Result: tuples are added, removed or modified

COMP9315 24T1 ♢ Lectures Part A ♢ [5/55]
❖ Query Evaluator

Most common function of relational DBMSs

E.g.

select s.id, c.code, e.mark
from   Students s
       join Enrolments e on s.id = e.student
       join Courses c on e.course = c.id;

COMP9315 24T1 ♢ Lectures Part A ♢ [6/55]
❖ Query Evaluator (cont)

Input: SQL query

[Diagram:Pics/intro/query.png]

Output: table (displayed as text)

COMP9315 24T1 ♢ Lectures Part A ♢ [7/55]
❖ DBMS Architecture

The aim of this course is to

Why should we care?   (apart from passing the exam)

Practical reason:

Educational reason:
COMP9315 24T1 ♢ Lectures Part A ♢ [8/55]
❖ DBMS Architecture (cont)

Path of a query through a typical DBMS:

[Diagram:Pics/intro/qryeval1.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [9/55]
❖ DBMS Architecture (cont)

[Diagram:Pics/intro/dbmsarch.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [10/55]
❖ DBMS Architecture (cont)

Important factors related to DBMS architecture

Implications: Modern DBMSs interact with storage via the O/S file-system.


** SSDs change things a little, but most high volume bulk storage still on disks

COMP9315 24T1 ♢ Lectures Part A ♢ [11/55]
❖ DBMS Architecture (cont)

Implementation of DBMS operations is complicated by

Require "concurrency-tolerant" data structures.

Transactions/reliability require some form of logging.

COMP9315 24T1 ♢ Lectures Part A ♢ [12/55]
❖ Database Engine Operations

DB engine = "relational algebra virtual machine".

Common operators of relational algebra (RA):

selection (σ ) projection (π ) join ( )
union () intersection () difference (-)
sort group aggregate

For each of these operations:

COMP9315 24T1 ♢ Lectures Part A ♢ [13/55]
❖ Relational Algebra

All RA operators return a result of type relation.

For convenience, we can name a result and use it later.

E.g. database   R1(x,y),   R2(y,z),  


Tmp1(x,y)   = Sel[x>5]R1
Tmp2(y,z)   = Sel[z=3]R2
Tmp3(x,y,z) = Tmp1 Join Tmp2
Res(x,z)    = Proj[x,z] Tmp3
-- which is equivalent to
Tmp1(x,y,z) = R1 Join R2
Tmp2(x,y,z) = Sel[x>5 & z=3] Tmp1
Res(x,z)    = Proj[x,z]Tmp2

Each "intermediate result" has a well-defined schema.

COMP9315 24T1 ♢ Lectures Part A ♢ [14/55]
❖ DBMS Query Evaluation

Path of a query through a typical DBMS:

[Diagram:Pics/intro/qryeval1.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [15/55]
❖ A Relational Algebra Engine

[Diagram:Pics/intro/dbmsarch1.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [16/55]
❖ PostgreSQL

PostgreSQL is a full-featured open-source (O)RDBMS.

COMP9315 24T1 ♢ Lectures Part A ♢ [17/55]
❖ User View of PostgreSQL

Users interact via SQL in a client process, e.g.


$ psql webcms
psql (15.6)
Type "help" for help.
webcms2=# select * from calendar;
 id | course |   evdate   |      event
----+--------+------------+---------------------------
  1 |      4 | 2001-08-09 | Project Proposals due
 10 |      3 | 2001-08-01 | Tute/Lab Enrolments Close
 12 |      3 | 2001-09-07 | Assignment #1 Due (10pm)
 ...

or


$dbconn = pg_connect("dbname=webcms");
$result = pg_query($dbconn,"select * from calendar");
while ($tuple = pg_fetch_array($result))
   { ... $tuple["event"] ... }

COMP9315 24T1 ♢ Lectures Part A ♢ [18/55]
❖ PostgreSQL Functionality

PostgreSQL systems deal with various kinds of objects:

COMP9315 24T1 ♢ Lectures Part A ♢ [19/55]
❖ PostgreSQL Functionality (cont)

PostgreSQL's dialect of SQL is mostly standard (but with extensions).

COMP9315 24T1 ♢ Lectures Part A ♢ [20/55]
❖ PostgreSQL Functionality (cont)

PostgreSQL stored procedures differ from SQL standard:

create function ( ArgTypes ) returns ResultType
as $$
... body of function definition ...
$$ language FunctionBodyLanguage;

COMP9315 24T1 ♢ Lectures Part A ♢ [21/55]
❖ PostgreSQL Functionality (cont)

Example:

create or replace function
    barsIn(suburb text) returns setof Bars
as $$
declare
    r record;
begin
    for r in
        select * from Bars where location = suburb
    loop
       return next r;
    end loop;
end;
$$ language plpgsql;
used as e.g.
select * from barsIn('Randwick');

COMP9315 24T1 ♢ Lectures Part A ♢ [22/55]
❖ PostgreSQL Functionality (cont)

Uses multi-version concurrency control (MVCC)

Disadvantages of this approach:
PostgreSQL also provides locking to enforce critical concurrency.
COMP9315 24T1 ♢ Lectures Part A ♢ [23/55]
❖ PostgreSQL Functionality (cont)

PostgreSQL has a well-defined and open extensibility model:

COMP9315 24T1 ♢ Lectures Part A ♢ [24/55]
❖ Database Objects

RDBMSs manage different kinds of objects

Many objects have names (and, in PostgreSQL, all can have OIDs).

How are the different types of objects represented?

How do we go from a name (or OID) to bytes stored on disk?

COMP9315 24T1 ♢ Lectures Part A ♢ [25/55]
❖ Catalogs

Consider what information the RDBMS needs about relations:

Similarly for other DBMS objects (e.g. views, functions, triggers, ...)

This information is stored in the system catalog tables

Standard for catalogs in SQL:2003: INFORMATION_SCHEMA

COMP9315 24T1 ♢ Lectures Part A ♢ [26/55]
❖ Catalogs (cont)

The catalog is affected by several types of SQL operations:

where Object is one of table, view, function, trigger, schema, ...

E.g. drop table Groups; produces something like

delete from Tables
where  schema = 'public' and name = 'groups';

COMP9315 24T1 ♢ Lectures Part A ♢ [27/55]
❖ Catalogs (cont)

In PostgreSQL, the system catalog is available to users via:


The low-level representation is available to sysadmins via:
COMP9315 24T1 ♢ Lectures Part A ♢ [28/55]
❖ Catalogs (cont)

You can explore the PostgreSQl catalog via psql commands

You can also explore via SQL on the catalog tables
COMP9315 24T1 ♢ Lectures Part A ♢ [29/55]
❖ Catalogs (cont)

A PostgreSQL installation (cluster) typically has many DBs

Some catalog information is global, e.g.

Other catalog information is local to each database, e.g
COMP9315 24T1 ♢ Lectures Part A ♢ [30/55]
❖ Catalogs (cont)

Side-note:   PostgreSQL tuples contain

OIDs are used as primary keys in many of the catalog tables.
COMP9315 24T1 ♢ Lectures Part A ♢ [31/55]
❖ Representing Databases

Above the level of individual DB schemata, we have:

These tables are global to each PostgreSQL cluster.

Keys are names (strings) and must be unique within cluster.

COMP9315 24T1 ♢ Lectures Part A ♢ [32/55]
❖ Representing Databases (cont)

pg_database contains information about databases:

pg_namespace contains information about schemata: pg_tablespace contains information about tablespaces: PostgreSQL represents access via array of access items:

Role=Privileges/Grantor

where Privileges is a string enumerating privileges, e.g.

jas=arwdRxt/jas,fred=r/jas,joe=rwad/jas

COMP9315 24T1 ♢ Lectures Part A ♢ [33/55]
❖ Representing Tables

Representing one table needs tuples in several catalog tables.

Due to O-O heritage, base table for tables is called pg_class.

The pg_class table also handles other "table-like" objects:

All tuples in pg_class have an OID, used as primary key.

Some fields from the pg_class table:

COMP9315 24T1 ♢ Lectures Part A ♢ [34/55]
❖ Representing Tables (cont)

Details of catalog tables representing database tables

pg_class holds core information about tables

pg_attribute contains information about attributes pg_type contains information about types
COMP9315 24T1 ♢ Lectures Part A ♢ [35/55]
❖ PostgreSQL Architecture

Client/server architecture:

[Diagram:Pics/intro/proc-arch.png]


The listener process is sometimes called postmaster

COMP9315 24T1 ♢ Lectures Part A ♢ [36/55]
❖ PostgreSQL Architecture (cont)

Memory/storage architecture:

[Diagram:Pics/intro/mem-arch.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [37/55]
❖ PostgreSQL Architecture (cont)

File-system architecture:

[Diagram:Pics/intro/file-arch.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [38/55]
❖ PostgreSQL Source Code

Top-level of PostgreSQL distribution contains:

COMP9315 24T1 ♢ Lectures Part A ♢ [39/55]
❖ PostgreSQL Source Code (cont)

The source code directory (src) contains:

along with Makefiles to build system and other directories ...

Code for backend (DBMS engine)

COMP9315 24T1 ♢ Lectures Part A ♢ [40/55]
❖ PostgreSQL Source Code (cont)

How to get started understanding the workings of PostgreSQL:

Some helpful information is available via:
COMP9315 24T1 ♢ Lectures Part A ♢ [41/55]
❖ PostgreSQL Source Code (cont)

PostgreSQL documentation has detailed description of internals:

See also "How PostgreSQL Processes a Query"
COMP9315 24T1 ♢ Lectures Part A ♢ [42/55]
❖ Life-cycle of a PostgreSQL query

How a PostgreSQL query is executed:

COMP9315 24T1 ♢ Lectures Part A ♢ [43/55]
❖ Life-cycle of a PostgreSQL query (cont)

Data flow to get to execute a query:

[Diagram:Pics/intro/pg-processes.png]

COMP9315 24T1 ♢ Lectures Part A ♢ [44/55]
❖ PostgreSQL server

PostgresMain(int argc, char *argv[], ...)

COMP9315 24T1 ♢ Lectures Part A ♢ [45/55]
❖ PostgreSQL server (cont)

As well as handling SQL queries, PostgresqlMain also

COMP9315 24T1 ♢ Lectures Part A ♢ [46/55]
❖ PostgreSQL Data Types

Data types defined in *.h files under src/include/

Two important data types: Node and List

COMP9315 24T1 ♢ Lectures Part A ♢ [47/55]
❖ PostgreSQL Query Evaluation

exec_simple_query(const char *query_string)

COMP9315 24T1 ♢ Lectures Part A ♢ [48/55]
❖ PostgreSQL Query Evaluation (cont)

pg_parse_query(char *sqlStatements)

pg_analyze_and_rewrite(Node *parsetree, ...)
COMP9315 24T1 ♢ Lectures Part A ♢ [49/55]
❖ PostgreSQL Query Evaluation (cont)

Each query is represented by a Query structure

COMP9315 24T1 ♢ Lectures Part A ♢ [50/55]
❖ PostgreSQL Query Evaluation (cont)

pg_plan_queries(querytree_list, ...)

COMP9315 24T1 ♢ Lectures Part A ♢ [51/55]
❖ PostgreSQL Query Evaluation (cont)

Each executable query is represented by a PlannedStmt node

Each Plan node represents one relational operation
COMP9315 24T1 ♢ Lectures Part A ♢ [52/55]
❖ PostgreSQL Query Evaluation (cont)

PlannedStmt *planner(Query *parse, ...)

COMP9315 24T1 ♢ Lectures Part A ♢ [53/55]
❖ PostgreSQL Query Evaluation (cont)

Queries run in a Portal environment containing

Portal defined in src/include/utils/portal.h

PortalRun() function also requires

COMP9315 24T1 ♢ Lectures Part A ♢ [54/55]
❖ PostgreSQL Query Evaluation (cont)

How query evaluation happens in exec_simple_query():

COMP9315 24T1 ♢ Lectures Part A ♢ [55/55]


Produced: 29 Apr 2024