DBMS Overview

COMP9315 23T1 ♢ DBMS Overview ♢ [0/11]
❖ DBMSs

DBMS = DataBase Management System

Our view of the DBMS so far ...


[Diagram:Pics/dbms/qeval1.png]


A machine to process SQL queries.

COMP9315 23T1 ♢ DBMS Overview ♢ [1/11]
❖ DBMSs (cont)

One view of DB engine: "relational algebra virtual machine"

Machine code for such a machine:

selection (σ) projection (π) join (⋈, ×)
union (∪) intersection (∩) difference (-)
sort insert delete

For each of these operations:

COMP9315 23T1 ♢ DBMS Overview ♢ [2/11]
❖ Query Evaluation

The path of a query through its evaluation:


[Diagram:Pics/dbms/qeval2.png]

COMP9315 23T1 ♢ DBMS Overview ♢ [3/11]
❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d = 100
-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

In general:

COMP9315 23T1 ♢ DBMS Overview ♢ [4/11]
❖ Mapping Example

Consider the database schema:

Person(pid, name, address, ...)
Subject(sid, code, title, uoc, ...)
Terms(tid, code, start, end, ...)
Courses(cid, sid, tid, ...)
Enrolments(cid, pid, mark, ..)

and the query: Courses with more than 100 students in them?

which can be expressed in SQL as

select s.sid, s.code
from   Course c join Subject s on (c.sid=s.sid)
       join Enrolment e on (c.cid=e.cid)
group  by s.sid, s.code
having count(*) > 100;

COMP9315 23T1 ♢ DBMS Overview ♢ [5/11]
❖ Mapping Example (cont)

The SQL might be compiled to

Tmp1(cid,sid,pid)   = Course Join[c.cid = e.cid] Enrolment
Tmp2(cid,code,pid)  = Tmp1 Join[t1.sid = s.sid] Subject
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code)       = Sel[nstu > 100](Tmp3)

or, equivalently

Tmp1(cid,code)      = Course Join[c.sid = s.sid] Subject
Tmp2(cid,code,pid)  = Tmp1 Join[t1.cid = e.cid] Enrolment
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code)       = Sel[nstu > 100](Tmp3)

Which is better?

COMP9315 23T1 ♢ DBMS Overview ♢ [6/11]
❖ Query Cost Estimation

The cost of evaluating a query is determined by

Analysis of costs involves estimating:
Accessing data from disk is the dominant cost in query evaluation
COMP9315 23T1 ♢ DBMS Overview ♢ [7/11]
❖ Query Cost Estimation (cont)

Consider execution plans for:    σc (R  ⋈d  S  ⋈e  T)     where R(c,d), S(d,e), T(e)

Tmp1(c,d,e)  :=  hash_join[d](R,S)
Tmp2(c,d,e)  :=  sort_merge_join[e](tmp1,T)
Res(c,d,e)   :=  binary_search[c](Tmp2)

or

Tmp1(d,e)    :=  sort_merge_join[e](S,T)
Tmp2(c,d,e)  :=  hash_join[d](R,Tmp1)
Res(c,d,e)   :=  linear_search[c](Tmp2)

or

Tmp1(c,d)    :=  btree_search[c](R)
Tmp2(c,d,e)  :=  hash_join[d](Tmp1,S)
Res(c,d,e)   :=  sort_merge_join[e](Tmp2,T)

All produce same result, but have different costs.

COMP9315 23T1 ♢ DBMS Overview ♢ [8/11]
❖ Implementations of RA Ops

Sorting   (quicksort, etc. are not applicable)

Selection   (different techniques developed for different query types) Join   (fast joins are critical to success of relational DBMSs)
COMP9315 23T1 ♢ DBMS Overview ♢ [9/11]
❖ DBMS Architecture

Most RDBMSs are client-server systems:

[Diagram:Pics/dbms/client-server.png]

COMP9315 23T1 ♢ DBMS Overview ♢ [10/11]
❖ DBMS Architecture (cont)

Layers within the DBMS server:

[Diagram:Pics/intro/dbmsarch1.png]

COMP9315 23T1 ♢ DBMS Overview ♢ [11/11]


Produced: 29 Jan 2023