COMP9315 21T1 ♢ Relational Operations ♢ [0/11]
❖ DBMS Architecture (revisited) | |
Implementation of relational operations in DBMS:
COMP9315 21T1 ♢ Relational Operations ♢ [1/11]
DBMS core = relational engine, with implementations of
- selection, projection, join, set operations
- scanning, sorting, grouping, aggregation, ...
In this part of the course:
- examine methods for implementing each operation
- develop cost models for each implementation
- characterise when each method is most effective
Terminology reminder:
- tuple = collection of data values under some schema ≅ record
- page = block = collection of tuples + management data = i/o unit
- relation = table ≅ file = collection of tuples
COMP9315 21T1 ♢ Relational Operations ♢ [2/11]
❖ Relational Operations (cont) | |
In order to implement relational operations the low-levels of the system provides:
-
Relation openRel(db,name)
- get handle on relation
name
in database db
-
Page request_page(rel,pid)
- get page
pid
from relation rel
, return buffer containing page
-
Record get_record(buf,tid)
- return record
tid
from page buf
-
Tuple mkTuple(rel,rec)
- convert record
rec
to a tuple, based on rel
schema
COMP9315 21T1 ♢ Relational Operations ♢ [3/11]
❖ Relational Operations (cont) | |
Example of using low-level functions
Page p;
Tuple t;
Relation r = relOpen(db,"Emps");
for (int i = 0; i < nPages(r); i++) {
p = request_page(rel,i);
for (int j = 0; j < nRecs(p); j++)
t = mkTuple(r, get_record(p,j));
}
}
COMP9315 21T1 ♢ Relational Operations ♢ [4/11]
❖ Relational Operations (cont) | |
Two "dimensions of variation":
- which relational operation (e.g. Sel, Proj, Join, Sort, ...)
- which access-method (e.g. file struct: heap, indexed, hashed, ...)
Each
query method involves an operator and a file structure:
- e.g. primary-key selection on hashed file
- e.g. primary-key selection on indexed file
- e.g. join on ordered heap files (sort-merge join)
- e.g. join on hashed files (hash join)
- e.g. two-dimensional range query on R-tree indexed file
We are interested in
cost of query methods
(and insert/delete operations)
COMP9315 21T1 ♢ Relational Operations ♢ [5/11]
❖ Relational Operations (cont) | |
SQL vs DBMS engine
-
select ... from R where C
- find relevant tuples (satisfying C) in file(s) of R
-
insert into R values(...)
- place new tuple in some page of a file of R
-
delete from R where C
- find relevant tuples and "remove" from file(s) of R
-
update R set ... where C
- find relevant tuples in file(s) of R and "change" them
COMP9315 21T1 ♢ Relational Operations ♢ [6/11]
An important aspect of this course is
- analysis of cost of various query methods
Cost can be measured in terms of
- Time Cost: total time taken to execute method, or
- Page Cost: number of pages read and/or written
Primary assumptions in our cost models:
- memory (RAM) is "small", fast, byte-at-a-time
- disk storage is very large, slow, page-at-a-time
COMP9315 21T1 ♢ Relational Operations ♢ [7/11]
Since time cost is affected by many factors
- speed of i/o devices (fast/slow disk, SSD)
- load on machine
we do not consider time cost in our analyses.
For comparing methods, page cost is better
- identifies workload imposed by method
- BUT is clearly affected by buffering
Estimating costs with multiple concurrent ops and buffering is difficult!!
Addtional assumption: every page request leads to some i/o
COMP9315 21T1 ♢ Relational Operations ♢ [8/11]
In developing cost models, we also assume:
- a relation is a set of r tuples, with average tuple size R bytes
- the tuples are stored in b data pages on disk
- each page has size B bytes and contains up to c tuples
- the tuples which answer query q are contained in bq pages
- data is transferred disk↔memory in whole pages
- cost of disk↔memory transfer Tr/w is very high
COMP9315 21T1 ♢ Relational Operations ♢ [9/11]
Our cost models are "rough" (based on assumptions)
But do give an O(x) feel for how expensive operations are.
Example "rough" estimation: how many piano tuners in Sydney?
- Sydney has ≅ 4 000 000 people
- Average household size ≅ 3 ∴ 1 300 000 households
- Let's say that 1 in 10 households owns a piano
- Therefore there are ≅ 130 000 pianos
- Say people get their piano tuned every 2 years (on average)
- Say a tuner can do 2/day, 250 working-days/year
- Therefore 1 tuner can do 500 pianos per year
- Therefore Sydney would need ≅ 130000/2/500 = 130 tuners
Actual number of tuners in Yellow Pages = 120
Example borrowed from Alan Fekete at Sydney University.
COMP9315 21T1 ♢ Relational Operations ♢ [10/11]
Type |
| SQL |
| RelAlg |
| a.k.a. |
Scan |
| select * from R |
| R |
| - |
Proj |
| select x,y from R |
| Proj[x,y]R |
| - |
Sort |
| select * from R order by x |
| Sort[x]R |
| ord |
Sel1 |
| select * from R where id = k |
| Sel[id=k]R |
| one |
Seln |
| select * from R where a = k |
| Sel[a=k]R |
| - |
Join1 |
| select * from R,S where R.id = S.r |
| R Join[id=r] S |
| - |
Different query classes exhibit different query processing behaviours.
COMP9315 21T1 ♢ Relational Operations ♢ [11/11]
Produced: 27 Feb 2021