COMP9315 Week 04 Thursday Lecture

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [0/51]
❖ Things To Note


COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [1/51]
❖ Debugging Assignment 1

Don't rely just on running ./run_test.py

Run some tests "manually"

... run your PostgreSQL server ...
$ cd /localstorage/$USER/testing
$ createdb xyz
$ psql xyz -f pname.sql   # load the data type
$ psql xyz -e -f tests/0_sanity-checks/queries1.sql
$ psql xyz -e -f tests/0_sanity-checks/queries2.sql
$ psql xyz -e -f tests/0_sanity-checks/queries3.sql
$ psql xyz -e -f tests/0_sanity-checks/queries4.sql
$ psql xyz -e -f tests/0_sanity-checks/queries5.sql

Note: -e displays the SQL statements being executed

If this is not fine-grained enough, run the individual queries.

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [2/51]
❖ Debugging Assignment 1 (cont)

Tests in 0_sanity_checks don't write any data to disk.

Later tests attempt to store and retrieve PersonName data, e.g.

... ensure your PostgreSQL server is running ...
$ cd /localstorage/$USER/testing
$ dropdb xyz
$ createdb xyz
$ psql xyz -f pname.sql   # load the data type
$ cd tests/1_users
$ psql xyz -f schema.sql
$ psql xyz -f data1.sql
$ psql xyz -e xyz -f queries1.sql
$ psql xyz -e xyz -f queries2.sql
$ psql xyz
...
xyz=# select count(*) from Users;
 count 
-------
    20

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [3/51]
❖ Debugging Assignment 1 (cont)

Any implementation of PersonName that looks like:

typedef struct PersonName {
   char *family;
   char *given;
} PersonName;

is incorrect ... it produces the following:

[Diagram:Pics/assignments/pname2.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [4/51]
❖ Debugging Assignment 1 (cont)

This implementation of PersonName is slightly better:

typedef struct PersonName {
   char family[128];
   char given[128];
} PersonName;

It keeps all of the data within the PersonName object, BUT

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [5/51]
❖ Debugging Assignment 1 (cont)

Correct implementation of PersonName requires:

From chapter 38 in the PostgreSQL documentation:

If the internal representation of the data type is variable-length, the internal representation must follow the standard layout for variable-length data: the first four bytes must be a char[4] field which is never accessed directly (customarily named vl_len_). You must use the SET_VARSIZE() macro to store the total size of the datum (including the length field itself) in this field and VARSIZE() to retrieve it. (These macros exist because the length field may be encoded differently depending on platform.)

There are examples in postgresql-15.6/contrib

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [6/51]
❖ Debugging Assignment 1 (cont)

Debugging outside PostgreSQL:

Will require re-writing function interfaces to fit into PostgreSQL.
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [7/51]
❖ Linear Hashing

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [8/51]
❖ Selection with Lin.Hashing

If b != 2d, treat different parts of the file differently.

[Diagram:Pics/file-struct/linhash2.png]

Parts A and C are treated as if part of a file of size 2d+1.

Part B is treated as if part of a file of size 2d.

Part D does not yet exist (tuples in B may move into it).

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [9/51]
❖ Selection with Lin.Hashing (cont)

Modified search algorithm:

// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (P < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
         and its overflow blocks {
    if (t.k == val) return R;
}

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [10/51]
❖ File Expansion with Lin.Hashing

[Diagram:Pics/file-struct/linhash3.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [11/51]
❖ Insertion with Lin.Hashing

Abstract view:

pid = bits(d,hash(val));
if (pid < sp) pid = bits(d+1,hash(val));
// bucket[pid] = page pid + its overflow pages
for each page P in bucket[pid] {
    if (space in P) {
        insert tuple into P
        break
    }
}
if (no insertion) {
    add new ovflow page to bucket[pid]
    insert tuple into new page
}
if (need to split) {
    partition tuples from bucket[sp]
          into bucket[sp] and bucket[sp+2^d]
    sp++;
    if (sp == 2^d) { d++; sp = 0; }
}

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [12/51]
❖ Splitting

How to decide that we "need to split"?

Two approaches to triggering a split:

Note: always split block sp, even if not full/"current"

Systematic splitting like this ...

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [13/51]
❖ Splitting (cont)

Splitting process for block sp=01:

[Diagram:Pics/file-struct/linhash4.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [14/51]
❖ Exercise: Insertion into Linear Hashed File

Consider a file with b=4, c=3, d=2, sp=0, hash(x) as above

Insert tuples in alpha order with the following keys and hashes:

khash(k)    khash(k)    khash(k)    khash(k)
a10001   g00000   m11001   s01110
b11010   h00000   n01000   t10011
c01111   i10010   o00110   u00010
d01111   j10110   p11101   v11111
e01100   k00101   q00010   w10000
f00010   l00101   r00000   x00111

The hash values are the 5 lower-order bits from the full 32-bit hash.

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [15/51]
❖ Exercise: Insertion into Linear Hashed File (cont)

Splitting algorithm:


// partition tuples between two buckets
newp = sp + 2^d; oldp = sp;
for all tuples t in bucket[oldp] {
    pid = bits(d+1,hash(t.k));
    if (pid == newp) 
        add tuple t to bucket[newp]
    else
        add tuple t to bucket[oldp]
}
sp++;
if (sp == 2^d) { d++; sp = 0; }

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [16/51]
❖ Insertion Cost

If no split required, cost same as for standard hashing:

Costinsert:    Best: 1r + 1w,    Avg: (1+Ov)r + 1w,    Worst: (1+max(Ov))r + 2w

If split occurs, incur Costinsert  plus cost of splitting:

On average, Costsplit  =  (1+Ov)r + (2+Ov)w
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [17/51]
❖ Deletion with Lin.Hashing

Deletion is similar to ordinary static hash file.

But might wish to contract file when enough tuples removed.

Rationale: r  shrinks, b  stays large wasted space.

Method:

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [18/51]
❖ Hash Files in PostgreSQL

PostgreSQL uses linear hashing on tables which have been:

create index Ix on R using hash (k);

Hash file implementation: backend/access/hash


Based on "A New Hashing Package for Unix", Margo Seltzer, Winter Usenix 1991
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [19/51]
❖ Hash Files in PostgreSQL (cont)

PostgreSQL uses slightly different file organisation ...

If overflow pages become empty, add to free list and re-use.
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [20/51]
❖ Hash Files in PostgreSQL (cont)

PostgreSQL hash file structure:

[Diagram:Pics/file-struct/pgsql-hashfile.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [21/51]
❖ Indexing

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [22/51]
❖ Indexing

An index is a file of (keyVal,tupleID) pairs, e.g.

[Diagram:Pics/file-struct/index.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [23/51]
❖ Indexes

A 1-d index is based on the value of a single attribute A.

Some possible properties of A:

Taxonomy of index types, based on properties of index attribute:

primary index on unique field, may be sorted on A
clustering index on non-unique field, file sorted on A
secondary file not sorted on A

A given table may have indexes on several attributes.

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [24/51]
❖ Indexes (cont)

Indexes themselves may be structured in several ways:

dense every tuple is referenced by an entry in the index file
sparse only some tuples are referenced by index file entries
single-level tuples are accessed directly from the index file
multi-level may need to access several index pages to reach tuple

Index file has total i pages   (where typically i ≪ b)

Index file has page capacity ci   (where typically ci ≫ c)

Dense index:  i = ceil( r/ci )     Sparse index:  i = ceil( b/ci )

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [25/51]
❖ Dense Primary Index


[Diagram:Pics/file-struct/dense-primary-index.png]


Data file unsorted; one index entry for each tuple

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [26/51]
❖ Sparse Primary Index


[Diagram:Pics/file-struct/sparse-primary-index.png]


Data file sorted; one index entry for each page

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [27/51]
❖ Exercise: Index Storage Overheads

Consider a relation with the following storage parameters:

How many pages are needed to hold a dense index?

How many pages are needed to hold a sparse index?

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [28/51]
❖ Selection with Primary Index

For one queries:

ix = binary search index for entry with key K
if nothing found { return NotFound }
P = getPage(pageOf(ix.tid))
t = getTuple(P,offsetOf(ix.tid))
   -- may require reading overflow pages
return t

Worst case:   read log2i  index pages  +  read 1+Ov data pages.

Thus, Costone,prim  =  log2 i + 1 + Ov

Assume: index pages are same size as data pages ⇒ same reading cost

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [29/51]
❖ Selection with Primary Index (cont)

For range queries on primary key:

For pmr queries involving primary key: For queries not involving primary key, index gives no help.
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [30/51]
❖ Selection with Primary Index (cont)

Method for range queries (when data file is not sorted)

// e.g. select * from R where a between lo and hi
pages = {}   results = {}
ixPage = findIndexPage(R.ixf,lo)
while (ixTup = getNextIndexTuple(R.ixf)) {
   if (ixTup.key > hi) break;
   pages = pages ∪ pageOf(ixTup.tid)
}
foreach pid in pages {
   // scan data page plus ovflow chain
   while (buf = getPage(R.datf,pid)) {
      foreach tuple T in buf {
         if (lo<=T.a && T.a<=hi)
            results = results ∪ T
}  }  }

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [31/51]
❖ Insertion with Primary Index

Overview:

tid = insert tuple into page P at position p
find location for new entry in index file
insert new index entry (k,tid) into index file 

Problem: order of index entries must be maintained

Reogranisation requires, on average, read/write half of index file:

Costinsert,prim  =  (log2i)r + i/2.(1r+1w) + (1+Ov)r + (1+δ)w

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [32/51]
❖ Deletion with Primary Index

Overview:

find tuple using index
mark tuple as deleted
delete index entry for tuple

If we delete index entries by marking ...

If we delete index entry by index file reorganisation ...
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [33/51]
❖ Clustering Index

Data file sorted; one index entry for each key value

[Diagram:Pics/file-struct/clustering-index.png]

Cost penalty: maintaining both index and data file as sorted

(Note: can't mark index entry for value X until all X tuples are deleted)

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [34/51]
❖ Secondary Index

Data file not sorted; want one index entry for each key value

[Diagram:Pics/file-struct/sec-index.png]

Costpmr  =  (log2iix1 + aix2 + bq.(1 + Ov))

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [35/51]
❖ Multi-level Indexes

Above Secondary Index used two index files to speed up search

Could improve further by

Ultimately, reduce top-level of index hierarchy to one page.

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [36/51]
❖ Multi-level Indexes (cont)

Example data file with three-levels of index:

[Diagram:Pics/file-struct/multi-level-index.png]

Assume:  not primary key,  c = 100,  ci = 3

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [37/51]
❖ Select with Multi-level Index

For one query on indexed key field:

xpid = top level index page
for level = 1 to d {
    read index entry xpid
    search index page for J'th entry
        where index[J].key <= K < index[J+1].key
    if (J == -1) { return NotFound }
    xpid = index[J].page
}
pid = xpid  // pid is data page index
search page pid and its overflow pages

Costone,mli  =  (d + 1 + Ov)r

(Note that d = ceil( logci r ) and ci is large because index entries are small)

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [38/51]
❖ B-Trees

B-trees are MSTs with the properties:

B-tree insertion and deletion methods Advantages of B-trees over general MSTs
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [39/51]
❖ B-Trees (cont)

Example B-tree (depth=3, n=3)   (actually B+ tree)

[Diagram:Pics/file-struct/btree0.png]

(Note: in DBs, nodes are pages ⇒ large branching factor, e.g. n=500)

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [40/51]
❖ B-Tree Depth

Depth depends on effective branching factor  (i.e. how full nodes are).

Simulation studies show typical B-tree nodes are 69% full.

Gives   load Li = 0.69 × ci   and   depth of tree ~ ceil( logLi r ).

Example: ci=128,    Li=88

Level #nodes #keys
root 1 87
1 88 7656
2 7744 673728
3 681472 59288064

Note: ci is generally larger than 128 for a real B-tree.

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [41/51]
❖ Insertion into B-Trees

Overview of the method:

  1. find leaf node and position in node where entry would be stored
  2. if node is not full, insert entry into appropriate spot
  3. if node is full, split node into two half-full nodes
                           and promote middle element to parent
  4. if parent full, split and promote upwards
  5. if reach root, and root is full, make new root upwards

Note: if duplicates not allowed and key exists, may stop after step 1.
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [42/51]
❖ Example: B-tree Insertion

Starting from this tree:

[Diagram:Pics/file-struct/btree1.png]

insert the following keys in the given order   12 15 30 10

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [43/51]
❖ Example: B-tree Insertion (cont)


[Diagram:Pics/file-struct/btree1a.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [44/51]
❖ Example: B-tree Insertion (cont)

[Diagram:Pics/file-struct/btree1b.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [45/51]
❖ B-Tree Insertion Cost

Insertion cost = CosttreeSearch + CosttreeInsert + CostdataInsert

Best case: write one page (most of time)

Costinsert  =  Dr + 1w + 1r + 1w

Common case: 3 node writes (rearrange 2 leaves + parent)

Costinsert  =  Dr + 3w + 1r + 1w
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [46/51]
❖ B-Tree Insertion Cost (cont)

Worst case: 2D-1 node writes (propagate to root)

Costinsert  =  Dr + (2D-1)w + 1r + 1w
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [47/51]
❖ Selection with B-Trees

For one queries:

N = B-tree root node
while (N is not a leaf node)
   N = scanToFindChild(N,K)
tid = scanToFindEntry(N,K)
access tuple T using tid

Costone  =  (D + 1)r

[Diagram:Pics/file-struct/btree00.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [48/51]
❖ Selection with B-Trees (cont)

For range queries (assume sorted on index attribute):

search index to find leaf node for Lo
for each leaf node entry until Hi found {
	access tuple T using tid from entry
}

Costrange  =  (D + bi + bq)r

[Diagram:Pics/file-struct/btree00.png]

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [49/51]
❖ B-trees in PostgreSQL

PostgreSQL implements ≅ Lehman/Yao-style B-trees

B-tree implementation: backend/access/nbtree Notes:
COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [50/51]
❖ B-trees in PostgreSQL (cont)

Interface functions for B-trees

// build Btree index on relation
Datum btbuild(rel,index,...)
// insert index entry into Btree
Datum btinsert(rel,key,tupleid,index,...)
// start scan on Btree index
Datum btbeginscan(rel,key,scandesc,...)
// get next tuple in a scan
Datum btgettuple(scandesc,scandir,...)
// close down a scan
Datum btendscan(scandesc)

COMP9315 24T1 ♢ Week 4 Thursday Lecture ♢ [51/51]


Produced: 7 Mar 2024