COMP9315 Week 03 Thursday Lecture

Things To Note
Debugging Assignment 1
Scanning Relations
Scanning in PostgreSQL
Scanning in other File Structures
Sorting
The Sort Operation
Two-way Merge Sort
Comparison for Sorting
Cost of Two-way Merge Sort
n-Way Merge Sort
Cost of n-Way Merge Sort
Exercise: Cost of n-Way Merge Sort
Sorting in PostgreSQL
Implementing Projection
The Projection Operation
Sort-based Projection
Exercise: Cost of Sort-based Projection
Cost of Sort-based Projection
Hash-based Projection
Exercise: Cost of Hash-based Projection
Cost of Hash-based Projection
Projection on Primary Key
Index-only Projection
Comparison of Projection Methods
Projection in PostgreSQL
Implementing Selection
Varieties of Selection
Exercise: Query Types
Implementing Select Efficiently
Heap Files
Selection in Heaps
Insertion in Heaps
Deletion in Heaps
Exercise: Cost of Deletion in Heaps

❖ Things To Note

Quiz 2 ... next week ... out Monday, due Friday 11:59pm
Assignment 1: Valid Names = BNF grammar (see below)
Assignment 1: Titles: don't waste time checking
e.g. Mr, Ms, Dr, Prof, Sir, Reverend, Colonel, President, Bishop, Major, .....

PersonName ::= Family','Given | Family', 'Given

Family     ::= NameList
Given      ::= NameList

NameList   ::= Name | Name' 'NameList

Name       ::= Upper Letters

Letter     ::= Upper | Lower | Punc

Letters    ::= Letter | Letter Letters

Upper      ::= 'A' | 'B' | ... | 'Z'
Lower      ::= 'a' | 'b' | ... | 'z'
Punc       ::= '-' | "'"

<< ∧ >>

❖ Debugging Assignment 1

Symptoms: a message like FATAL: the database system ...

Solution: write debugging output to the log file

See elog(), section 56.2 in the PostgreSQL documentation

Still can't work it out?

give it and send me email ... as a last resort

<< ∧ >>

❖ Scanning Relations

Example: simple scan of a table ...

select name from Employee

implemented as:

DB db = openDatabase("myDB");
Relation r = openRel(db,"Employee");
Scan s = start_scan(r);
Tuple t;  // current tuple
while ((t = next_tuple(s)) != NULL)
{
   char *name = getStrField(t,2);
   printf("%s\n", name);
}

<< ∧ >>

❖ Scanning Relations (cont)

Consider the following simple Scan data structure

typedef struct {
   Relation rel;
   Page     *curPage;  // Page buffer
   int      curPID;    // current pid
   int      curTID;    // current tid
} ScanData;
typedef ScanData *Scan;

Scan start_scan(Relation rel)
{
	Scan new = malloc(ScanData);
	new->rel = rel;
	new->curPage = get_page(rel,0);
	new->curPID = 0;
	new->curTID = 0;
	return new;
}

<< ∧ >>

❖ Scanning Relations (cont)

Implementation of next_tuple() function:

Tuple next_tuple(Scan s)
{
	if (s->curTID == nTuples(s->curPage)) {
		// finished cur page, get next
		if (s->curPID == nPages(s->rel))
			return NULL;
		s->curPID++;
		s->curPage = get_page(s->rel, s->curPID);
		s->curTID =0;
	}
	Record r = get_record(s->curPage, s->curTID);
	s->curTID++;
	return makeTuple(s->rel, r);
}

<< ∧ >>

❖ Scanning in PostgreSQL

Scanning defined in: backend/access/heap/heapam.c

Implements iterator data/operations:

HeapScanDesc ... struct containing iteration state
scan = heap_beginscan(rel,...,nkeys,keys)
tup = heap_getnext(scan, direction)
heap_endscan(scan) ... frees up scan struct
res = HeapKeyTest(tuple,...,nkeys,keys)
... performs ScanKeys tests on tuple ... checks is it a result tuple?

<< ∧ >>

❖ Scanning in PostgreSQL (cont)


typedef HeapScanDescData *HeapScanDesc;

typedef struct HeapScanDescData
{
  // scan parameters 
  Relation      rs_rd;        // heap relation descriptor 
  Snapshot      rs_snapshot;  // snapshot ... tuple visibility 
  int           rs_nkeys;     // number of scan keys 
  ScanKey       rs_key;       // array of scan key descriptors 
  ...
  // state set up at initscan time 
  PageNumber    rs_npages;    // number of pages to scan 
  PageNumber    rs_startpage; // page # to start at 
  ...
  // scan current state, initally set to invalid 
  HeapTupleData rs_ctup;      // current tuple in scan
  PageNumber    rs_cpage;     // current page # in scan
  Buffer        rs_cbuf;      // current buffer in scan
   ...
} HeapScanDescData;

<< ∧ >>

❖ Scanning in other File Structures

Above examples are for heap files

simple, unordered, maybe indexed, no hashing

Other access file structures in PostgreSQL:

btree, hash, gist, gin
each implements:
- startscan, getnext, endscan
- insert, delete (update=delete+insert)
- other file-specific operators

<< ∧ >>

❖ Sorting

<< ∧ >>

❖ The Sort Operation

Sorting is explicit in queries only in the order by clause

select * from Students order by name;

Sorting is used internally in other operations:

eliminating duplicate tuples for projection
ordering files to enhance select efficiency
implementing various styles of join
forming tuple groups in group by

Sort methods such as quicksort are designed for in-memory data.

For large data on disks, need external sorts such as merge sort.

<< ∧ >>

❖ Two-way Merge Sort

Example:

[Diagram:Pics/scansortproj/two-way-ex2.png]

<< ∧ >>

❖ Two-way Merge Sort (cont)

Requires three in-memory buffers:

[Diagram:Pics/scansortproj/two-way-buf.png]

Assumption: cost of Merge operation on two in-memory buffers ≅ 0.

<< ∧ >>

❖ Comparison for Sorting

Above assumes that we have a function to compare tuples.

Needs to understand ordering on different data types.

Need a function tupCompare(r1,r2,f) (cf. C's strcmp)

int tupCompare(r1,r2,f)
{
   if (r1.f < r2.f) return -1;
   if (r1.f > r2.f) return 1;
   return 0;
}

Assume =, <, > are available for all attribute types.

<< ∧ >>

❖ Comparison for Sorting (cont)

In reality, need to sort on multiple attributes and ASC/DESC, e.g.

-- example multi-attribute sort
select * from Students
order by age desc, year_enrolled

Sketch of multi-attribute sorting function

int tupCompare(r1,r2,criteria)
{
   foreach (f,ord) in criteria {
      if (ord == ASC) {
         if (r1.f < r2.f) return -1;
         if (r1.f > r2.f) return 1;
      }
      else {
         if (r1.f > r2.f) return -1;
         if (r1.f < r2.f) return 1;
      }
   }
   return 0;
}

<< ∧ >>

❖ Cost of Two-way Merge Sort

For a file containing b data pages:

require ceil(log₂b) passes to sort,
each pass requires b page reads, b page writes

Gives total cost: 2.b.ceil(log₂b)

Example: Relation with r=10⁵ and c=50 ⇒ b=2000 pages.

Number of passes for sort: ceil(log₂2000) = 11

Reads/writes entire file 11 times! Can we do better?

<< ∧ >>

❖ n-Way Merge Sort

Initial pass uses: B total buffers

[Diagram:Pics/scansortproj/n-way-buf-pass0.png]

Reads B pages at a time, sorts in memory, writes out in order

<< ∧ >>

❖ n-Way Merge Sort (cont)

Merge passes use: B-1 = n input buffers, 1 output buffer

[Diagram:Pics/scansortproj/n-way-buf.png]

<< ∧ >>

❖ n-Way Merge Sort (cont)

Method:


// Produce B-page-long runs
for each group of B pages in Rel {
    read B pages into memory buffers
    sort group in memory
    write B pages out to Temp
}
// Merge runs until everything sorted
numberOfRuns = ⌈b/B⌉
while (numberOfRuns > 1) {
    // n-way merge, where n=B-1
    for each group of n runs in Temp {
        merge into a single run via input buffers
        write run to newTemp via output buffer
    }
    numberOfRuns = ⌈numberOfRuns/n⌉
    Temp = newTemp // swap input/output files
}

<< ∧ >>

❖ Cost of n-Way Merge Sort

Consider file where b = 4096, B = 16 total buffers:

pass 0 produces 256 × 16-page sorted runs
pass 1
- performs 15-way merge of groups of 16-page sorted runs
- produces 18 × 240-page sorted runs (17 full runs, 1 short run)
pass 2
- performs 15-way merge of groups of 240-page sorted runs
- produces 2 × 3600-page sorted runs (1 full run, 1 short run)
pass 3
- performs 15-way merge of groups of 3600-page sorted runs
- produces 1 × 4096-page sorted runs

(cf. two-way merge sort which needs 11 passes)

<< ∧ >>

❖ Cost of n-Way Merge Sort (cont)

Generalising from previous example ...

For b data pages and B buffers

first pass: read/writes b pages, gives b₀ = ⌈b/B⌉ runs
then need ⌈log_nb₀⌉ passes until sorted, where n = B-1
each pass reads and writes b pages (i.e. 2.b page accesses)

Cost = 2.b.(1 + ⌈log_nb₀⌉), where b₀ = ⌈b/B⌉ and n = B-1

<< ∧ >>

❖ Exercise: Cost of n-Way Merge Sort

How many reads+writes to sort the following:

r = 1048576 tuples (2²⁰)
R = 62 bytes per tuple (fixed-size)
B = 4096 bytes per page
H = 96 bytes of header data per page
D = 1 presence bit per tuple in page directory
all pages are full

Consider for the cases:

9 total buffers, for merging: 8 input buffers, 1 output buffer
33 total buffers, for merging: 32 input buffers, 1 output buffer
257 total buffers, for merging: 256 input buffers, 1 output buffer

<< ∧ >>

❖ Sorting in PostgreSQL

Sort uses a merge-sort (from Knuth) similar to above:

backend/utils/sort/tuplesort.c
include/utils/sortsupport.h

Tuples are mapped to SortTuple structs for sorting:

each SortTuple contains pointer to tuple and sort key
no need to deal with actual Tuples during sort
unless multiple attributes used in sort

If all data fits into memory, sort using qsort().

If memory fills while reading, form "runs" and do disk-based sort.

<< ∧ >>

❖ Sorting in PostgreSQL (cont)

Disk-based sort has phases:

divide input into sorted runs using HeapSort
merge using N buffers, one output buffer
N = as many buffers as workMem allows

Described in terms of "tapes" ("tape" ≅ sorted run)

Implementation of "tapes": backend/utils/sort/logtape.c

<< ∧ >>

❖ Sorting in PostgreSQL (cont)

Sorting comparison operators are obtained via catalog (in Type.o):

// gets pointer to function via pg_operator
struct Tuplesortstate { ... SortTupleComparator ... };

// returns negative, zero, positive
ApplySortComparator(Datum datum1, bool isnull1,
                    Datum datum2, bool isnull2,
                    SortSupport sort_helper);

Flags indicate: ascending/descending, nulls-first/last.

ApplySortComparator() is PostgreSQL's version of tupCompare()

<< ∧ >>

❖ Implementing Projection

<< ∧ >>

❖ The Projection Operation

Consider the query:

select distinct name,age from Employee;

If the Employee relation has four tuples such as:

(94002, John, Sales, Manager,   32)
(95212, Jane, Admin, Manager,   39)
(96341, John, Admin, Secretary, 32)
(91234, Jane, Admin, Secretary, 21)

then the result of the projection is:

(Jane, 21)   (Jane, 39)   (John, 32)

Note that duplicate tuples (e.g. (John,32)) are eliminated.

<< ∧ >>

❖ The Projection Operation (cont)

The projection operation needs to:

scan the entire relation as input
- already seen how to do scanning
remove unwanted attributes in output tuples
- implementation depends on tuple internal structure
- essentially, make a new tuple with fewer attributes
  and where the values may be computed from existing attributes
eliminate any duplicates produced (if distinct)
- two approaches: sorting or hashing

<< ∧ >>

❖ Sort-based Projection

Requires a temporary file/relation (Temp)

for each tuple T in Rel {
    T' = mkTuple([attrs],T)
    write T' to Temp
}

sort Temp on [attrs]

for each tuple T in Temp {
    if (T == Prev) continue
    write T to Result
    Prev = T
}

<< ∧ >>

❖ Exercise: Cost of Sort-based Projection

Consider a table R(x,y,z) with tuples:

Page 0:  (1,1,'a')   (11,2,'a')  (3,3,'c')
Page 1:  (13,5,'c')  (2,6,'b')   (9,4,'a')
Page 2:  (6,2,'a')   (17,7,'a')  (7,3,'b')
Page 3:  (14,6,'a')  (8,4,'c')   (5,2,'b')
Page 4:  (10,1,'b')  (15,5,'b')  (12,6,'b')
Page 5:  (4,2,'a')   (16,9,'c')  (18,8,'c')

SQL: create T as (select distinct y from R)

Assuming:

3 memory buffers, 2 for input, one for output
pages/buffers hold 3 R tuples (i.e. c_R=3), 6 T tuples (i.e. c_T=6)

Show how sort-based projection would execute this statement.

<< ∧ >>

❖ Cost of Sort-based Projection

The costs involved are (assuming B=n+1 buffers for sort):

scanning original relation Rel: b_R (with c_R)
writing Temp relation: b_T (smaller tuples, c_T > c_R, sorted)
sorting Temp relation:
2.b_T.(1+ceil(log_nb₀)) where b₀ = ceil(b_T/B)
scanning Temp, removing duplicates: b_T
writing the result relation: b_Out (maybe less tuples)

Cost = sum of above = b_R + b_T + 2.b_T.(1+ceil(log_nb₀)) + b_T + b_Out

<< ∧ >>

❖ Hash-based Projection

Partitioning phase:

[Diagram:Pics/scansortproj/hash-project.png]

<< ∧ >>

❖ Hash-based Projection (cont)

Duplicate elimination phase:

[Diagram:Pics/scansortproj//hash-project2.png]

<< ∧ >>

❖ Hash-based Projection (cont)

Algorithm for both phases:

for each tuple T in relation Rel {
    T' = mkTuple([attrs],T)
    H = h1(T', n)
    B = buffer for partition[H]
    if (B full) write and clear B
    insert T' into B
}
for each partition P in 0..n-1 {
    for each tuple T in partition P {
        H = h2(T, n)
        B = buffer for hash value H
        if (T not in B) insert T into B
        // assumes B never gets full
    }
    write and clear all buffers
}

<< ∧ >>

❖ Exercise: Cost of Hash-based Projection

Consider a table R(x,y,z) with tuples:

Page 0:  (1,1,'a')   (11,2,'a')  (3,3,'c')
Page 1:  (13,5,'c')  (2,6,'b')   (9,4,'a')
Page 2:  (6,2,'a')   (17,7,'a')  (7,3,'b')
Page 3:  (14,6,'a')  (8,4,'c')   (5,2,'b')
Page 4:  (10,1,'b')  (15,5,'b')  (12,6,'b')
Page 5:  (4,2,'a')   (16,9,'c')  (18,8,'c')
-- and then the same tuples repeated for pages 6-11

SQL: create T as (select distinct y from R)

Assuming:

4 memory buffers, one for input, 3 for partitioning
pages/buffers hold 3 R tuples (i.e. c_R=3), 4 T tuples (i.e. c_T=4)
hash functions: h1(x) = x%3, h2(x) = (x%4)%3

Show how hash-based projection would execute this statement.

<< ∧ >>

❖ Cost of Hash-based Projection

The total cost is the sum of the following:

scanning original relation R: b_R
writing partitions: b_P (b_R vs b_P ?)
re-reading partitions: b_P
writing the result relation: b_Out

Cost = b_R + 2b_P + b_Out

To ensure that n is larger than the largest partition ...

use hash functions (h1,h2) with uniform spread
allocate at least sqrt(b_R)+1 buffers
if insufficient buffers, significant re-reading overhead

<< ∧ >>

❖ Projection on Primary Key

No duplicates, so the above approaches are not required.

Method:

bR = nPages(Rel)
for i in 0 .. bR-1 {
   P = read page i
   for j in 0 .. nTuples(P) {
      T = getTuple(P,j)
      T' = mkTuple(pk, T)
      if (outBuf is full) write and clear
      append T' to outBuf
   }
}
if (nTuples(outBuf) > 0) write

<< ∧ >>

❖ Index-only Projection

Can do projection without accessing data file iff ...

relation is indexed on (A₁,A₂,...A_n) (indexes described later)
projected attributes are a prefix of (A₁,A₂,...A_n)

Basic idea:

scan through index file (which is already sorted on attributes)
duplicates are already adjacent in index, so easy to skip

Cost analysis ...

index has b_i pages (where b_i ≪ b_R)
Cost = b_i reads + b_Out writes

<< ∧ >>

❖ Comparison of Projection Methods

Difficult to compare, since they make different assumptions:

index-only: needs an appropriate index
hash-based: needs buffers and good hash functions
sort-based: needs only buffers ⇒ use as default

Best case scenario for each (assuming n+1 in-memory buffers):

index-only: b_i + b_Out ≪ b_R + b_Out
hash-based: b_R + 2.b_P + b_Out
sort-based: b_R + b_T + 2.b_T.ceil(log_nb₀) + b_T + b_Out

We normally omit b_Out, since each method produces the same result

<< ∧ >>

❖ Projection in PostgreSQL

Code for projection forms part of execution iterators:

backend/executor/execQual.c

Functions involved with projection:

ExecProject(projInfo,...) ... extracts projected data
check_sql_fn_retval(...) ... makes new tuple via TargetList
ExecStoreTuple(newTuple,...) ... save tuple in buffer

plus many many others ...

<< ∧ >>

❖ Implementing Selection

<< ∧ >>

❖ Varieties of Selection

Selection: select * from R where C

filters a subset of tuples from one relation R
based on a condition C on the attribute values

We consider three distinct styles of selection:

1-d (one dimensional) (condition uses only 1 attribute)
n-d (multi-dimensional) (condition uses >1 attribute)
similarity (approximate matching, with ranking)

Each style has several possible file-structures/techniques.

<< ∧ >>

❖ Varieties of Selection (cont)

Examples of different selection types:

one: select * from R where id = 1234
pmr: select * from R where age=65 (1-d)

select * from R where age=65 and gender='m' (n-d)
rng: select * from R where age≥18 and age≤21 (1-d)

select * from R where age between 18 and 21 (n-d)
and height between 160 and 190
note: rng = range

<< ∧ >>

❖ Exercise: Query Types

Using the relation:

create table Courses (
   id       integer primary key,
   code     char(8),  -- e.g. 'COMP9315'
   title    text,     -- e.g. 'Computing 1'
   year     integer,  -- e.g. 2000..2016
   convenor integer references Staff(id),
   constraint once_per_year unique (code,year)
);

give examples of each of the following query types:

a 1-d one query, an n-d one query
a 1-d pmr query, an n-d pmr query
a 1-d range query, an n-d range query

Suggest how many solutions each might produce ...

<< ∧ >>

❖ Implementing Select Efficiently

Two basic approaches:

physical arrangement of tuples
- sorting (search strategy)
- hashing (static, dynamic, n-dimensional)
additional indexing information
- index files (primary, secondary, trees)
- signatures (superimposed, disjoint)

Our analyses assume: 1 input buffer available for each relation.

If more buffers are available, most methods benefit.

<< ∧ >>

❖ Heap Files

Note: this is not "heap" as in the top-to-bottom ordered tree.
It means simply an unordered collection of tuples in a file.

<< ∧ >>

❖ Selection in Heaps

For all selection queries, the only possible strategy is:

// select * from R where C
for each page P in file of relation R {
    for each tuple t in page P {
        if (t satisfies C)
            add tuple t to result set
    }
}

i.e. linear scan through file searching for matching tuples

<< ∧ >>

❖ Selection in Heaps (cont)

The heap is scanned from the first to the last page:

Cost_range = Cost_pmr = b

If we know that only one tuple matches the query (one query),
a simple optimisation is to stop the scan once that tuple is found.

Cost_one : Best = 1 Average = b/2 Worst = b

<< ∧ >>

❖ Insertion in Heaps

Insertion: new tuple is appended to file (in last page).

rel = openRelation("R", READ|WRITE);
pid = nPages(rel)-1;
get_page(rel, pid, buf);
if (size(newTup) > size(buf))
   { deal with oversize tuple }
else {
   if (!hasSpace(buf,newTup))
      { pid++; nPages(rel)++; clear(buf); }
   insert_record(buf,newTup);
   put_page(rel, pid, buf);
}

Cost_insert = 1_r + 1_w

Plus possible extra writes for oversize tuples, e.g. PostgreSQL's TOAST

<< ∧ >>

❖ Insertion in Heaps (cont)

Alternative strategy:

find any page from R with enough space
preferably a page already loaded into memory buffer

PostgreSQL's strategy:

use last updated page of R in buffer pool
otherwise, search buffer pool for page with enough space
assisted by free space map (FSM) associated with each table
for details: backend/access/heap/{heapam.c,hio.c}

<< ∧ >>

❖ Insertion in Heaps (cont)

PostgreSQL's tuple insertion:

heap_insert(Relation relation,    // relation desc
            HeapTuple newtup,     // new tuple data
            CommandId cid, ...)   // SQL statement

finds page which has enough free space for newtup
ensures page loaded into buffer pool and locked
copies tuple data into page buffer, sets xmin, etc.
marks buffer as dirty
writes details of insertion into transaction log
returns OID of new tuple if relation has OIDs

<< ∧ >>

❖ Deletion in Heaps

SQL: delete from R where Condition

Implementation of deletion:

rel = openRelation("R",READ|WRITE);
for (p = 0; p < nPages(rel); p++) {
    get_page(rel, p, buf);
    ndels = 0;
    for (i = 0; i < nTuples(buf); i++) {
        tup = get_record(buf,i);
        if (tup satisfies Condition)
            { ndels++; delete_record(buf,i); }
    }
    if (ndels > 0) put_page(rel, p, buf);
    if (ndels > 0 && unique) break;
}

<< ∧

❖ Exercise: Cost of Deletion in Heaps

Consider the following queries ...

delete from Employees where id = 12345  -- one
delete from Employees where dept = 'Marketing'  -- pmr
delete from Employees where 40 <= age and age < 50  -- range

Show how each will be executed and estimate the cost, assuming:

b = 100, b_q2 = 3, b_q3 = 20

State any other assumptions.

Generalise the cost models for each query type.