❖ Relational Operations |
DBMS core = relational engine, with implementations of
❖ Cost Analyses |
When showing the cost of operations, don't include Tr and Tw:
request_page()release_page()❖ Cost Analyses (cont) |
Two "dimensions of variation":
❖ Cost Analyses (cont) |
SQL vs DBMS engine
select ... from R where Cinsert into R values(...)delete from R where Cupdate R set ... where C❖ Query Types |
| SQL | RelAlg | a.k.a. | ||||
select * from R |
R | - | ||||
select from R |
Proj[x,y]R | - | ||||
select * from Rorder by |
Sort[x]R | ord | ||||
select * from Rwhere id = |
Sel[id=k]R | one | ||||
select * from Rwhere a = |
Sel[a=k]R | - | ||||
select * from R,Swhere R.id = S.r |
R Join[id=r] S | - |
❖ File Structures |
When describing file structures
❖ File Structures (cont) |
Consider three simple file structures:
Some records in each page may be marked as "deleted".
❖ Scanning |
Consider the query:
select * from Rel;
Operational view:
for each page P in file of relation Rel {
for each tuple t in page P {
add tuple t to result set
}
}
Cost: read every data page once
Cost = b (b page reads + ≤b page writes)
❖ Scanning (cont) |
In this case, the implementation changes to:
for each page P in file of relation T {
for each tuple t in page P {
add tuple t to result set
}
for each overflow page V of page P {
for each tuple t in page V {
add tuple t to result set
} } }
Cost: read each data and overflow page once
Cost = b + bOv
where bOv = total number of overflow pages
❖ Selection via Scanning |
Consider a one query like:
select * from Employee where id = 762288;
In an unordered file, search for matching tuple requires:
Guaranteed at most one answer; but could be in any page.
❖ Selection via Scanning (cont) |
Overview of scan process:
for each page P in relation Employee {
for each tuple t in page P {
if (t.id == 762288) return t
} }
Cost analysis for one searching in unordered file
❖ Relation Copying |
Consider an SQL statement like:
create table T as (select * from S);
Effectively, copies data from one table to another.
Process:
s = start scan of S
make empty relation T
while (t = next_tuple(s)) {
insert tuple t into relation T
}
❖ Relation Copying (cont) |
Possible that TS
ST
❖ Relation Copying (cont) |
In terms of existing relation/page/tuple operations:
Relation in; // relation handle (incl. files) Relation out; // relation handle (incl. files) int ipid,opid,tid; // page and record indexes Record rec; // current record (tuple) Page ibuf,obuf; // input/output file buffers in = openRelation("S", READ); out = openRelation("T", NEW|WRITE); clear(obuf); opid = 0; for (ipid = 0; ipid < nPages(in); ipid++) { get_page(in, ipid, ibuf); for (tid = 0; tid < nTuples(ibuf); tid++) { rec = get_record(ibuf, tid); if (!hasSpace(obuf,rec)) { put_page(out, opid++, obuf); clear(obuf); } insert_record(obuf,rec); } } if (nTuples(obuf) > 0) put_page(out, opid, obuf);
❖ Scanning Relations |
Example: simple scan of a table ...
select name from Employee
implemented as:
DB db = openDatabase("myDB");
Relation r = openRel(db,"Employee");
Scan s = start_scan(r);
Tuple t; // current tuple
while ((t = next_tuple(s)) != NULL)
{
char *name = getStrField(t,2);
printf("%s\n", name);
}
❖ Scanning Relations (cont) |
Consider the following simple Scan
typedef struct {
Relation rel;
Page *curPage; // Page buffer
int curPID; // current pid
int curTID; // current tid
} ScanData;
typedef ScanData *Scan;
Scan start_scan(Relation rel)
{
Scan new = malloc(ScanData);
new->rel = rel;
new->curPage = get_page(rel,0);
new->curPID = 0;
new->curTID = 0;
return new;
}
❖ Scanning Relations (cont) |
Implementation of next_tuple()
Tuple next_tuple(Scan s)
{
if (s->curTID == nTuples(s->curPage)) {
// finished cur page, get next
if (s->curPID == nPages(s->rel))
return NULL;
s->curPID++;
s->curPage = get_page(s->rel, s->curPID);
s->curTID =0;
}
Record r = get_record(s->curPage, s->curTID);
s->curTID++;
return makeTuple(s->rel, r);
}
❖ Scanning in PostgreSQL |
Scanning defined in: backend/access/heap/heapam.c
Implements iterator data/operations:
HeapScanDescscan = heap_beginscan(rel,...,nkeys,keys)tup = heap_getnext(scan, direction)heap_endscan(scan)scanres = HeapKeyTest(tuple,...,nkeys,keys)ScanKeys❖ Scanning in PostgreSQL (cont) |
typedef HeapScanDescData *; typedef struct HeapScanDescData { // scan parameters Relation rs_rd; // heap relation descriptor Snapshot rs_snapshot; // snapshot ... tuple visibility int rs_nkeys; // number of scan keys ScanKey rs_key; // array of scan key descriptors ... // state set up at initscan time PageNumber rs_npages; // number of pages to scan PageNumber rs_startpage; // page # to start at ... // scan current state, initally set to invalid HeapTupleData rs_ctup; // current tuple in scan PageNumber rs_cpage; // current page # in scan Buffer rs_cbuf; // current buffer in scan ... } HeapScanDescData; HeapScanDesc
❖ Scanning in other File Structures |
Above examples are for heap files
btreehashgistgin❖ The Sort Operation |
Sorting is explicit in queries only in the order by
select * from Students order by name;
Sorting is used internally in other operations:
group byFor large data on disks, need external sorts such as merge sort.
❖ Two-way Merge Sort (cont) |
Requires three in-memory buffers:
Assumption: cost of Merge operation on two in-memory buffers ≅ 0.
❖ Comparison for Sorting |
Above assumes that we have a function to compare tuples.
Needs to understand ordering on different data types.
Need a function tupCompare(r1,r2,f)strcmp
int tupCompare(r1,r2,f)
{
if (r1.f < r2.f) return -1;
if (r1.f > r2.f) return 1;
return 0;
}
Assume =, <, > are available for all attribute types.
❖ Comparison for Sorting (cont) |
In reality, need to sort on multiple attributes and ASC/DESC, e.g.
-- example multi-attribute sort
select * from Students
order by age desc, year_enrolled
Sketch of multi-attribute sorting function
int tupCompare(r1,r2,criteria)
{
foreach (f,ord) in criteria {
if (ord == ASC) {
if (r1.f < r2.f) return -1;
if (r1.f > r2.f) return 1;
}
else {
if (r1.f > r2.f) return -1;
if (r1.f < r2.f) return 1;
}
}
return 0;
}
❖ Cost of Two-way Merge Sort |
For a file containing b data pages:
Example: Relation with r=105 and c=50 ⇒ b=2000 pages.
Number of passes for sort: ceil(log22000) = 11
Reads/writes entire file 11 times! Can we do better?
❖ n-Way Merge Sort |
Initial pass uses: B total buffers
Reads B pages at a time, sorts in memory, writes out in order
❖ n-Way Merge Sort (cont) |
Method:
// Produce B-page-long runs for each group of B pages in Rel { read B pages into memory buffers sort group in memory write B pages out to Temp } // Merge runs until everything sorted numberOfRuns = ⌈b/B⌉ while (numberOfRuns > 1) { // n-way merge, where n=B-1 for each group of n runs in Temp { merge into a single run via input buffers write run to newTemp via output buffer } numberOfRuns = ⌈numberOfRuns/n⌉ Temp = newTemp // swap input/output files }
❖ Cost of n-Way Merge Sort |
Consider file where b = 4096, B = 16 total buffers:
❖ Cost of n-Way Merge Sort (cont) |
Generalising from previous example ...
For b data pages and B buffers
❖ Sorting in PostgreSQL |
Sort uses a merge-sort (from Knuth) similar to above:
SortTupleqsort()If memory fills while reading, form "runs" and do disk-based sort.
❖ Sorting in PostgreSQL (cont) |
Disk-based sort has phases:
workMemImplementation of "tapes": backend/utils/sort/logtape.c
❖ Sorting in PostgreSQL (cont) |
Sorting comparison operators are obtained via catalog (in Type.o
// gets pointer to function via pg_operator struct Tuplesortstate { ... SortTupleComparator ... }; // returns negative, zero, positive ApplySortComparator(Datum datum1, bool isnull1, Datum datum2, bool isnull2, SortSupport sort_helper);
Flags indicate: ascending/descending, nulls-first/last.
ApplySortComparator()tupCompare()
❖ The Projection Operation |
Consider the query:
select distinct name,age from Employee;
If the Employee
(94002, John, Sales, Manager, 32) (95212, Jane, Admin, Manager, 39) (96341, John, Admin, Secretary, 32) (91234, Jane, Admin, Secretary, 21)
then the result of the projection is:
(Jane, 21) (Jane, 39) (John, 32)
Note that duplicate tuples (e.g. (John,32)
❖ The Projection Operation (cont) |
The projection operation needs to:
distinct❖ Sort-based Projection |
Requires a temporary file/relation (Temp
for each tuple T in Rel {
T' = mkTuple([attrs],T)
write T' to Temp
}
sort Temp on [attrs]
for each tuple T in Temp {
if (T == Prev) continue
write T to Result
Prev = T
}
❖ Cost of Sort-based Projection |
The costs involved are (assuming B=n+1 buffers for sort):
RelTempTempTemp❖ Hash-based Projection (cont) |
Algorithm for both phases:
for each tuple T in relation Rel {
T' = mkTuple([attrs],T)
H = h1(T', n)
B = buffer for partition[H]
if (B full) write and clear B
insert T' into B
}
for each partition P in 0..n-1 {
for each tuple T in partition P {
H = h2(T, n)
B = buffer for hash value H
if (T not in B) insert T into B
// assumes B never gets full
}
write and clear all buffers
}
❖ Cost of Hash-based Projection |
The total cost is the sum of the following:
RTo ensure that n is larger than the largest partition ...
❖ Projection on Primary Key |
No duplicates, so the above approaches are not required.
Method:
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P) {
T = getTuple(P,j)
T' = mkTuple(pk, T)
if (outBuf is full) write and clear
append T' to outBuf
}
}
if (nTuples(outBuf) > 0) write
❖ Index-only Projection |
Can do projection without accessing data file iff ...
Basic idea:
❖ Comparison of Projection Methods |
Difficult to compare, since they make different assumptions:
Best case scenario for each (assuming n+1 in-memory buffers):
❖ Projection in PostgreSQL |
Code for projection forms part of execution iterators:
ExecProject(projInfo,...)check_sql_fn_retval(...)ExecStoreTuple(newTuple,...)Produced: 30 Apr 2024