❖ Relational Operations |
DBMS core = relational engine, with implementations of
❖ Cost Analyses |
When showing the cost of operations, don't include Tr and Tw:
request_page()
release_page()
❖ Cost Analyses (cont) |
Two "dimensions of variation":
❖ Cost Analyses (cont) |
SQL vs DBMS engine
select ... from R where C
insert into R values(...)
delete from R where C
update R set ... where C
❖ Query Types |
SQL | RelAlg | a.k.a. | ||||
select * from R |
R | - | ||||
select from R |
Proj[x,y]R | - | ||||
select * from R order by |
Sort[x]R | ord | ||||
select * from R where id = |
Sel[id=k]R | one | ||||
select * from R where a = |
Sel[a=k]R | - | ||||
select * from R,S where R.id = S.r |
R Join[id=r] S | - |
❖ File Structures |
When describing file structures
❖ File Structures (cont) |
Consider three simple file structures:
Some records in each page may be marked as "deleted".
❖ Scanning |
Consider the query:
select * from Rel;
Operational view:
for each page P in file of relation Rel { for each tuple t in page P { add tuple t to result set } }
Cost: read every data page once
Cost = b (b page reads + ≤b page writes)
❖ Scanning (cont) |
In this case, the implementation changes to:
for each page P in file of relation T { for each tuple t in page P { add tuple t to result set } for each overflow page V of page P { for each tuple t in page V { add tuple t to result set } } }
Cost: read each data and overflow page once
Cost = b + bOv
where bOv = total number of overflow pages
❖ Selection via Scanning |
Consider a one query like:
select * from Employee where id = 762288;
In an unordered file, search for matching tuple requires:
Guaranteed at most one answer; but could be in any page.
❖ Selection via Scanning (cont) |
Overview of scan process:
for each page P in relation Employee { for each tuple t in page P { if (t.id == 762288) return t } }
Cost analysis for one searching in unordered file
❖ Relation Copying |
Consider an SQL statement like:
create table T as (select * from S);
Effectively, copies data from one table to another.
Process:
s = start scan of S make empty relation T while (t = next_tuple(s)) { insert tuple t into relation T }
❖ Relation Copying (cont) |
Possible that T
S
S
T
❖ Relation Copying (cont) |
In terms of existing relation/page/tuple operations:
Relation in; // relation handle (incl. files) Relation out; // relation handle (incl. files) int ipid,opid,tid; // page and record indexes Record rec; // current record (tuple) Page ibuf,obuf; // input/output file buffers in = openRelation("S", READ); out = openRelation("T", NEW|WRITE); clear(obuf); opid = 0; for (ipid = 0; ipid < nPages(in); ipid++) { get_page(in, ipid, ibuf); for (tid = 0; tid < nTuples(ibuf); tid++) { rec = get_record(ibuf, tid); if (!hasSpace(obuf,rec)) { put_page(out, opid++, obuf); clear(obuf); } insert_record(obuf,rec); } } if (nTuples(obuf) > 0) put_page(out, opid, obuf);
❖ Scanning Relations |
Example: simple scan of a table ...
select name from Employee
implemented as:
DB db = openDatabase("myDB");
Relation r = openRel(db,"Employee");
Scan s = start_scan(r);
Tuple t; // current tuple
while ((t = next_tuple(s)) != NULL)
{
char *name = getStrField(t,2);
printf("%s\n", name);
}
❖ Scanning Relations (cont) |
Consider the following simple Scan
typedef struct { Relation rel; Page *curPage; // Page buffer int curPID; // current pid int curTID; // current tid } ScanData; typedef ScanData *Scan; Scan start_scan(Relation rel) { Scan new = malloc(ScanData); new->rel = rel; new->curPage = get_page(rel,0); new->curPID = 0; new->curTID = 0; return new; }
❖ Scanning Relations (cont) |
Implementation of next_tuple()
Tuple next_tuple(Scan s) { if (s->curTID == nTuples(s->curPage)) { // finished cur page, get next if (s->curPID == nPages(s->rel)) return NULL; s->curPID++; s->curPage = get_page(s->rel, s->curPID); s->curTID =0; } Record r = get_record(s->curPage, s->curTID); s->curTID++; return makeTuple(s->rel, r); }
❖ Scanning in PostgreSQL |
Scanning defined in: backend/access/heap/heapam.c
Implements iterator data/operations:
HeapScanDesc
scan = heap_beginscan(rel,...,nkeys,keys)
tup = heap_getnext(scan, direction)
heap_endscan(scan)
scan
res = HeapKeyTest(tuple,...,nkeys,keys)
ScanKeys
❖ Scanning in PostgreSQL (cont) |
typedef HeapScanDescData *; typedef struct HeapScanDescData { // scan parameters Relation rs_rd; // heap relation descriptor Snapshot rs_snapshot; // snapshot ... tuple visibility int rs_nkeys; // number of scan keys ScanKey rs_key; // array of scan key descriptors ... // state set up at initscan time PageNumber rs_npages; // number of pages to scan PageNumber rs_startpage; // page # to start at ... // scan current state, initally set to invalid HeapTupleData rs_ctup; // current tuple in scan PageNumber rs_cpage; // current page # in scan Buffer rs_cbuf; // current buffer in scan ... } HeapScanDescData; HeapScanDesc
❖ Scanning in other File Structures |
Above examples are for heap files
btree
hash
gist
gin
❖ The Sort Operation |
Sorting is explicit in queries only in the order by
select * from Students order by name;
Sorting is used internally in other operations:
group by
For large data on disks, need external sorts such as merge sort.
❖ Two-way Merge Sort (cont) |
Requires three in-memory buffers:
Assumption: cost of Merge operation on two in-memory buffers ≅ 0.
❖ Comparison for Sorting |
Above assumes that we have a function to compare tuples.
Needs to understand ordering on different data types.
Need a function tupCompare(r1,r2,f)
strcmp
int tupCompare(r1,r2,f) { if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1; return 0; }
Assume =, <, > are available for all attribute types.
❖ Comparison for Sorting (cont) |
In reality, need to sort on multiple attributes and ASC/DESC, e.g.
-- example multi-attribute sort
select * from Students
order by age desc, year_enrolled
Sketch of multi-attribute sorting function
int tupCompare(r1,r2,criteria) { foreach (f,ord) in criteria { if (ord == ASC) { if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1; } else { if (r1.f > r2.f) return -1; if (r1.f < r2.f) return 1; } } return 0; }
❖ Cost of Two-way Merge Sort |
For a file containing b data pages:
Example: Relation with r=105 and c=50 ⇒ b=2000 pages.
Number of passes for sort: ceil(log22000) = 11
Reads/writes entire file 11 times! Can we do better?
❖ n-Way Merge Sort |
Initial pass uses: B total buffers
Reads B pages at a time, sorts in memory, writes out in order
❖ n-Way Merge Sort (cont) |
Method:
// Produce B-page-long runs for each group of B pages in Rel { read B pages into memory buffers sort group in memory write B pages out to Temp } // Merge runs until everything sorted numberOfRuns = ⌈b/B⌉ while (numberOfRuns > 1) { // n-way merge, where n=B-1 for each group of n runs in Temp { merge into a single run via input buffers write run to newTemp via output buffer } numberOfRuns = ⌈numberOfRuns/n⌉ Temp = newTemp // swap input/output files }
❖ Cost of n-Way Merge Sort |
Consider file where b = 4096, B = 16 total buffers:
❖ Cost of n-Way Merge Sort (cont) |
Generalising from previous example ...
For b data pages and B buffers
❖ Sorting in PostgreSQL |
Sort uses a merge-sort (from Knuth) similar to above:
SortTuple
qsort()
If memory fills while reading, form "runs" and do disk-based sort.
❖ Sorting in PostgreSQL (cont) |
Disk-based sort has phases:
workMem
Implementation of "tapes": backend/utils/sort/logtape.c
❖ Sorting in PostgreSQL (cont) |
Sorting comparison operators are obtained via catalog (in Type.o
// gets pointer to function via pg_operator struct Tuplesortstate { ... SortTupleComparator ... }; // returns negative, zero, positive ApplySortComparator(Datum datum1, bool isnull1, Datum datum2, bool isnull2, SortSupport sort_helper);
Flags indicate: ascending/descending, nulls-first/last.
ApplySortComparator()
tupCompare()
❖ The Projection Operation |
Consider the query:
select distinct name,age from Employee;
If the Employee
(94002, John, Sales, Manager, 32) (95212, Jane, Admin, Manager, 39) (96341, John, Admin, Secretary, 32) (91234, Jane, Admin, Secretary, 21)
then the result of the projection is:
(Jane, 21) (Jane, 39) (John, 32)
Note that duplicate tuples (e.g. (John,32)
❖ The Projection Operation (cont) |
The projection operation needs to:
distinct
❖ Sort-based Projection |
Requires a temporary file/relation (Temp
for each tuple T in Rel { T' = mkTuple([attrs],T) write T' to Temp } sort Temp on [attrs] for each tuple T in Temp { if (T == Prev) continue write T to Result Prev = T }
❖ Cost of Sort-based Projection |
The costs involved are (assuming B=n+1 buffers for sort):
Rel
Temp
Temp
Temp
❖ Hash-based Projection (cont) |
Algorithm for both phases:
for each tuple T in relation Rel {
T' = mkTuple([attrs],T)
H = h1(T', n)
B = buffer for partition[H]
if (B full) write and clear B
insert T' into B
}
for each partition P in 0..n-1 {
for each tuple T in partition P {
H = h2(T, n)
B = buffer for hash value H
if (T not in B) insert T into B
// assumes B never gets full
}
write and clear all buffers
}
❖ Cost of Hash-based Projection |
The total cost is the sum of the following:
R
To ensure that n is larger than the largest partition ...
❖ Projection on Primary Key |
No duplicates, so the above approaches are not required.
Method:
bR = nPages(Rel) for i in 0 .. bR-1 { P = read page i for j in 0 .. nTuples(P) { T = getTuple(P,j) T' = mkTuple(pk, T) if (outBuf is full) write and clear append T' to outBuf } } if (nTuples(outBuf) > 0) write
❖ Index-only Projection |
Can do projection without accessing data file iff ...
Basic idea:
❖ Comparison of Projection Methods |
Difficult to compare, since they make different assumptions:
Best case scenario for each (assuming n+1 in-memory buffers):
❖ Projection in PostgreSQL |
Code for projection forms part of execution iterators:
ExecProject(projInfo,...)
check_sql_fn_retval(...)
ExecStoreTuple(newTuple,...)
Produced: 30 Apr 2024