C: Implementing Selection (single-attribute)

COMP9315 24T1 ♢ Lectures Part C ♢ [0/90]
❖ Varieties of Selection

Selection:   select * from R where C

We consider three distinct styles of selection: Each style has several possible file-structures/techniques.
COMP9315 24T1 ♢ Lectures Part C ♢ [1/90]
❖ Varieties of Selection (cont)

Examples of different selection types:

COMP9315 24T1 ♢ Lectures Part C ♢ [2/90]
❖ Implementing Select Efficiently

Two basic approaches:

Our analyses assume: 1 input buffer available for each relation.

If more buffers are available, most methods benefit.

COMP9315 24T1 ♢ Lectures Part C ♢ [3/90]
❖ Heap Files

Note: this is not "heap" as in the top-to-bottom ordered tree.
It means simply an unordered collection of tuples in a file.
COMP9315 24T1 ♢ Lectures Part C ♢ [4/90]
❖ Selection in Heaps

For all selection queries, the only possible strategy is:

// select * from R where C
for each page P in file of relation R {
    for each tuple t in page P {
        if (t satisfies C)
            add tuple t to result set
    }
}

i.e. linear scan through file searching for matching tuples

COMP9315 24T1 ♢ Lectures Part C ♢ [5/90]
❖ Selection in Heaps (cont)

The heap is scanned from the first to the last page:

[Diagram:Pics/file-struct/heapscan.png]

Costrange  =  Costpmr  =  b

If we know that only one tuple matches the query (one query),
a simple optimisation is to stop the scan once that tuple is found.

Costone :      Best = 1      Average = b/2      Worst = b

COMP9315 24T1 ♢ Lectures Part C ♢ [6/90]
❖ Insertion in Heaps

Insertion: new tuple is appended to file (in last page).

rel = openRelation("R", READ|WRITE);
pid = nPages(rel)-1;
get_page(rel, pid, buf);
if (size(newTup) > size(buf))
   { deal with oversize tuple }
else {
   if (!hasSpace(buf,newTup))
      { pid++; nPages(rel)++; clear(buf); }
   insert_record(buf,newTup);
   put_page(rel, pid, buf);
}

Costinsert  =  1r + 1w

Plus possible extra writes for oversize tuples, e.g. PostgreSQL's TOAST

COMP9315 24T1 ♢ Lectures Part C ♢ [7/90]
❖ Insertion in Heaps (cont)

Alternative strategy:

PostgreSQL's strategy:
COMP9315 24T1 ♢ Lectures Part C ♢ [8/90]
❖ Insertion in Heaps (cont)

PostgreSQL's tuple insertion:

heap_insert(Relation relation,    // relation desc
            HeapTuple newtup,     // new tuple data
            CommandId cid, ...)   // SQL statement

COMP9315 24T1 ♢ Lectures Part C ♢ [9/90]
❖ Deletion in Heaps

SQL:  delete from R  where Condition

Implementation of deletion:

rel = openRelation("R",READ|WRITE);
for (p = 0; p < nPages(rel); p++) {
    get_page(rel, p, buf);
    ndels = 0;
    for (i = 0; i < nTuples(buf); i++) {
        tup = get_record(buf,i);
        if (tup satisfies Condition)
            { ndels++; delete_record(buf,i); }
    }
    if (ndels > 0) put_page(rel, p, buf);
    if (ndels > 0 && unique) break;
}

COMP9315 24T1 ♢ Lectures Part C ♢ [10/90]
❖ Deletion in PostgreSQL

PostgreSQL tuple deletion:

heap_delete(Relation relation,    // relation desc
            ItemPointer tid, ..., // tupleID
            CommandId cid, ...)   // SQL statement


Vacuuming eventually compacts space in each page.
COMP9315 24T1 ♢ Lectures Part C ♢ [11/90]
❖ Updates in Heaps

SQL:  update R  set F = val  where Condition

Analysis for updates is similar to that for deletion

Costupdate  =  br + bqw

Complication: new tuple larger than old version  (too big for page)

Solution:   delete, re-organise free space, then insert

COMP9315 24T1 ♢ Lectures Part C ♢ [12/90]
❖ Updates in Heaps (cont)

PostgreSQL tuple update:

heap_update(Relation relation,     // relation desc
            ItemPointer otid,      // old tupleID
            HeapTuple newtup, ..., // new tuple data
            CommandId cid, ...)    // SQL statement

COMP9315 24T1 ♢ Lectures Part C ♢ [13/90]
❖ Heaps in PostgreSQL

PostgreSQL stores all table data in heap files (by default).

Typically there are also associated index files.

If a file is more useful in some other form:

Heap file implementation:   src/backend/access/heap
COMP9315 24T1 ♢ Lectures Part C ♢ [14/90]
❖ Heaps in PostgreSQL (cont)

PostgreSQL "heap file" may use multiple physical files

COMP9315 24T1 ♢ Lectures Part C ♢ [15/90]
❖ Sorted Files

Records stored in file in order of some field k (the sort key).

Makes searching more efficient; makes insertion less efficient

E.g. assume c = 4

[Diagram:Pics/file-struct/insert-in-sorted.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [16/90]
❖ Sorted Files (cont)

In order to mitigate insertion costs, use overflow blocks.

[Diagram:Pics/file-struct/sfile1.png]

Total number of overflow blocks = bov.

Average overflow chain length = Ov = bov / b.

Bucket = data page + its overflow page(s)

COMP9315 24T1 ♢ Lectures Part C ♢ [17/90]
❖ Selection in Sorted Files

For one queries on sort key, use binary search.

// select * from R where k = val  (sorted on R.k)
lo = 0; hi = b-1
while (lo <= hi) {
    mid = (lo+hi) / 2;  // int division with truncation
    (tup,loVal,hiVal) = searchBucket(f,mid,k,val);
    if (tup != NULL) return tup;
    else if (val < loVal) hi = mid - 1;
    else if (val > hiVal) lo = mid + 1;
    else return NOT_FOUND;
}
return NOT_FOUND;

where  f is file for relation,  mid,lo,hi are page indexes,
                k is a field/attr,  val,loVal,hiVal are values for k

COMP9315 24T1 ♢ Lectures Part C ♢ [18/90]
❖ Selection in Sorted Files (cont)

Search a page and its overflow chain for a key value

searchBucket(f,pid,k,val)
{
    buf = getPage(f,pid);
    (tup,min,max) = searchPage(buf,k,val,+INF,-INF)
    if (tup != NULL) return(tup,min,max);
    ovf = openOvFile(f);
    ovp = ovflow(buf);
    while (tup == NULL && ovp != NO_PAGE) {
        buf = getPage(ovf,ovp);
        (tup,min,max) = searchPage(buf,k,val,min,max)
        ovp = ovflow(buf);
    }     
    return (tup,min,max);
}

Assumes each page contains index of next page in Ov chain

Note: getPage(f,pid) = { read_page(f,pid,buf); return buf; }

COMP9315 24T1 ♢ Lectures Part C ♢ [19/90]
❖ Selection in Sorted Files (cont)

Search within a page for key; also find min/max key values

searchPage(buf,k,val,min,max)
{
    res = NULL;
    for (i = 0; i < nTuples(buf); i++) {
        tup = getTuple(buf,i);
        if (tup.k == val) res = tup;
        if (tup.k < min) min = tup.k;
        if (tup.k > max) max = tup.k;
    }
    return (res,min,max);
}

COMP9315 24T1 ♢ Lectures Part C ♢ [20/90]
❖ Selection in Sorted Files (cont)

The above method treats each bucket like a single large page.

Cases:

Costone :     Best = 1      Worst = log2 b + bov

Average case cost analysis needs assumptions  (e.g. data distribution)

COMP9315 24T1 ♢ Lectures Part C ♢ [21/90]
❖ Search with pmr and range Queries

For pmr query, on non-unique attribute k, where file is sorted on k

E.g. select * from R where k = 2

[Diagram:Pics/file-struct/sfile-pmr.png]

Begin by locating a page p containing k=val   (as for one query).

Scan backwards and forwards from p to find matches.

Thus, Costpmr  =  Costone + (bq-1).(1+Ov)

COMP9315 24T1 ♢ Lectures Part C ♢ [22/90]
❖ Search with pmr and range Queries (cont)

For range queries on unique sort key (e.g. primary key):

E.g. select * from R where k >= 5 and k <= 13

[Diagram:Pics/file-struct/sfile-range-pk.png]

Costrange  =  Costone + (bq-1).(1+Ov)

COMP9315 24T1 ♢ Lectures Part C ♢ [23/90]
❖ Search with pmr and range Queries (cont)

For range queries on non-unique sort key, similar method to pmr.

E.g. select * from R where k >= 2 and k <= 6

[Diagram:Pics/file-struct/sfile-range.png]

Costrange = Costone + (bq-1).(1+Ov)

COMP9315 24T1 ♢ Lectures Part C ♢ [24/90]
❖ Search with pmr and range Queries (cont)

So far, have assumed query condition involves sort key k.

But what about   select * from R where j = 100.0 ?

If condition contains attribute j, not the sort key


Costone,   Costrange,   Costpmr   as for heap files

COMP9315 24T1 ♢ Lectures Part C ♢ [25/90]
❖ Insertion into Sorted Files

Insertion approach:

Thus, Costinsert  =  Costone + δw         (where δw = 1 or 2)

Consider insertions of k=33, k=25, k=99 into:

[Diagram:Pics/file-struct/sorted-file1.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [26/90]
❖ Deletion from Sorted Files

E.g.   delete from R where k = 2

Deletion strategy:

Cost depends on selectivity of selection condition

Recall: selectivity determines bq   (# pages with matches)

Thus, Costdelete  =  Costselect + bqw

COMP9315 24T1 ♢ Lectures Part C ♢ [27/90]
❖ Hashing

Basic idea: use key value to compute page address of tuple.

[Diagram:Pics/file-struct/hash.png]

e.g. tuple with key = v  is stored in page i

Requires: hash function h(v) that maps KeyDomain → [0..b-1].

COMP9315 24T1 ♢ Lectures Part C ♢ [28/90]
❖ Hashing (cont)

PostgreSQL hash function (simplified):


Datum hash_any(unsigned char *k, register int keylen)
{
   register uint32 a, b, c, len;
   /* Set up the internal state */
   len = keylen;  a = b = c = 0x9e3779b9 + len + 3923095;
   /* handle most of the key */
   while (len >= 12) {
      a += ka[0]; b += ka[1]; c += ka[2];
      mix(a, b, c);
      ka += 3;  len -= 12;
   }
   /* collect any data from last 11 bytes into a,b,c */
   mix(a, b, c);
   return UInt32GetDatum(c);
}

See backend/access/hash/hashfunc.c for details  (incl mix())

COMP9315 24T1 ♢ Lectures Part C ♢ [29/90]
❖ Hashing (cont)

hash_any() gives hash value as 32-bit quantity (uint32).

Two ways to map raw hash value into a page address:

COMP9315 24T1 ♢ Lectures Part C ♢ [30/90]
❖ Hashing Performance

Aims:

Note: if data distribution not uniform, address distribution can't be uniform.

Best case: every bucket contains same number of tuples.

Worst case: every tuple hashes to same bucket.

Average case: some buckets have more tuples than others.

Use overflow pages to handle "overfull" buckets  (cf. sorted files)

All tuples in each bucket must have same hash value.

COMP9315 24T1 ♢ Lectures Part C ♢ [31/90]
❖ Hashing Performance (cont)

Two important measures for hash files:

Three cases for distribution of tuples in a hashed file:

Case L Ov
Best ≅ 1 0
Worst ≫ 1 **
Average < 1 0<Ov<1

(** performance is same as Heap File)

To achieve average case, aim for   0.75  ≤  L  ≤  0.9.

COMP9315 24T1 ♢ Lectures Part C ♢ [32/90]
❖ Selection with Hashing

Select via hashing on unique key k (one)

// select * from R where k = val
(pid,P) = getPageViaHash(val,R)
for each tuple t in page P {
    if (t.k == val) return t
}
for each overflow page Q of P {
    for each tuple t in page Q {
        if (t.k == val) return t
}   }

Costone :   Best = 1,   Avg = 1+Ov/2   Worst = 1+max(OvLen)

COMP9315 24T1 ♢ Lectures Part C ♢ [33/90]
❖ Selection with Hashing (cont)

Select via hashing on non-unique hash key nk (pmr)

// select * from R where nk = val
(pid,P) = getPageViaHash(val,R) 
for each tuple t in page P {
    if (t.nk == val) add t to results
}
for each overflow page Q of P {
    for each tuple t in page Q {
        if (t.nk == val) add t to results
}   }
return results

Costpmr  =  1 + Ov

COMP9315 24T1 ♢ Lectures Part C ♢ [34/90]
❖ Selection with Hashing (cont)

Hashing does not help with range queries** ...

Costrange = b + bov


Selection on attribute j which is not hash key ...

Costone,   Costrange,   Costpmr  =  b + bov


** unless the hash function is order-preserving (and most aren't)

COMP9315 24T1 ♢ Lectures Part C ♢ [35/90]
❖ Insertion with Hashing

Insertion uses similar process to one queries.

// insert tuple t with key=val into rel R
(pid,P) = getPageViaHash(val,R) 
if room in page P {
    insert t into P; return
}
for each overflow page Q of P {
    if room in page Q {
        insert t into Q; return
}   }
add new overflow page Q
link Q to previous page
insert t into Q

Costinsert :    Best: 1r + 1w    Worst: 1+max(OvLen))r + 2w

COMP9315 24T1 ♢ Lectures Part C ♢ [36/90]
❖ Deletion with Hashing

Similar performance to select on non-unique key:

// delete from R where k = val
// f = data file ... ovf = ovflow file
(pid,P) = getPageViaHash(val,R)
ndel = delTuples(P,k,val)
if (ndel > 0) putPage(f,P,pid)
for each overflow page qid,Q of P {
    ndel = delTuples(Q,k,val)
    if (ndel > 0) putPage(ovf,Q,qid)
}

Extra cost over select is cost of writing back modified blocks.

Method works for both unique and non-unique hash keys.

COMP9315 24T1 ♢ Lectures Part C ♢ [37/90]
❖ Problem with Hashing...

So far, discussion of hashing has assumed a fixed file size (b).

What size file to use?

Change file size ⇒ change hash function ⇒ rebuild file

Methods for hashing with dynamic files:

COMP9315 24T1 ♢ Lectures Part C ♢ [38/90]
❖ Problem with Hashing... (cont)

All flexible hashing methods ...

Start with hash function to convert value to bit-string:

uint32 hash(unsigned char *val)

Require a function to extract d  bits from bit-string:

unit32 bits(int d, uint32 val)

Use result of bits() as page address.

COMP9315 24T1 ♢ Lectures Part C ♢ [39/90]
❖ Page Splitting

Important concept for flexible hashing: splitting


Result: expandable data file, never requiring a complete file rebuild
COMP9315 24T1 ♢ Lectures Part C ♢ [40/90]
❖ Page Splitting (cont)

Example of splitting:

[Diagram:Pics/file-struct/split.png]

Tuples only show key value; assume h(val) = val

COMP9315 24T1 ♢ Lectures Part C ♢ [41/90]
❖ Linear Hashing

File organisation:

Uses systematic method of growing data file ... Advantage: does not require auxiliary storage for a directory

Disadvantage: requires overflow pages   (don't split on full pages)

COMP9315 24T1 ♢ Lectures Part C ♢ [42/90]
❖ Linear Hashing (cont)

File grows linearly (one block at a time, at regular intervals).

Has "phases" of expansion; over each phase, b doubles.

[Diagram:Pics/file-struct/linhash1.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [43/90]
❖ Selection with Lin.Hashing

If b=2d, the file behaves exactly like standard hashing.

Use d  bits of hash to compute block address.

// select * from R where k = val
h = hash(val);
pid = bits(d,h);  // lower-order bits
P = getPage(f, pid)
for each tuple t in page P
         and its overflow pages {
    if (t.k == val) add t to Result;
}

Average Costone  =  1+Ov

COMP9315 24T1 ♢ Lectures Part C ♢ [44/90]
❖ Selection with Lin.Hashing (cont)

If b != 2d, treat different parts of the file differently.

[Diagram:Pics/file-struct/linhash2.png]

Parts A and C are treated as if part of a file of size 2d+1.

Part B is treated as if part of a file of size 2d.

Part D does not yet exist (tuples in B may eventually move into it).

COMP9315 24T1 ♢ Lectures Part C ♢ [45/90]
❖ Selection with Lin.Hashing (cont)

Modified search algorithm:

// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (pid < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
         and its overflow blocks {
    if (t.k == val) add t to Result;
}

COMP9315 24T1 ♢ Lectures Part C ♢ [46/90]
❖ File Expansion with Lin.Hashing

[Diagram:Pics/file-struct/linhash3.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [47/90]
❖ Selection with Lin.Hashing

If b != 2d, treat different parts of the file differently.

[Diagram:Pics/file-struct/linhash2.png]

Parts A and C are treated as if part of a file of size 2d+1.

Part B is treated as if part of a file of size 2d.

Part D does not yet exist (tuples in B may move into it).

COMP9315 24T1 ♢ Lectures Part C ♢ [48/90]
❖ Selection with Lin.Hashing (cont)

Modified search algorithm:

// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (P < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
         and its overflow blocks {
    if (t.k == val) return R;
}

COMP9315 24T1 ♢ Lectures Part C ♢ [49/90]
❖ File Expansion with Lin.Hashing

[Diagram:Pics/file-struct/linhash3.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [50/90]
❖ Insertion with Lin.Hashing

Abstract view:

pid = bits(d,hash(val));
if (pid < sp) pid = bits(d+1,hash(val));
// bucket[pid] = page pid + its overflow pages
for each page P in bucket[pid] {
    if (space in P) {
        insert tuple into P
        break
    }
}
if (no insertion) {
    add new ovflow page to bucket[pid]
    insert tuple into new page
}
if (need to split) {
    partition tuples from bucket[sp]
          into bucket[sp] and bucket[sp+2^d]
    sp++;
    if (sp == 2^d) { d++; sp = 0; }
}

COMP9315 24T1 ♢ Lectures Part C ♢ [51/90]
❖ Splitting

How to decide that we "need to split"?

Two approaches to triggering a split:

Note: always split block sp, even if not full/"current"

Systematic splitting like this ...

COMP9315 24T1 ♢ Lectures Part C ♢ [52/90]
❖ Splitting (cont)

Splitting process for block sp=01:

[Diagram:Pics/file-struct/linhash4.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [53/90]
❖ Exercise: Insertion into Linear Hashed File

Consider a file with b=4, c=3, d=2, sp=0, hash(x) as above

Insert tuples in alpha order with the following keys and hashes:

khash(k)    khash(k)    khash(k)    khash(k)
a10001   g00000   m11001   s01110
b11010   h00000   n01000   t10011
c01111   i10010   o00110   u00010
d01111   j10110   p11101   v11111
e01100   k00101   q00010   w10000
f00010   l00101   r00000   x00111

The hash values are the 5 lower-order bits from the full 32-bit hash.

COMP9315 24T1 ♢ Lectures Part C ♢ [54/90]
❖ Exercise: Insertion into Linear Hashed File (cont)

Splitting algorithm:


// partition tuples between two buckets
newp = sp + 2^d; oldp = sp;
for all tuples t in bucket[oldp] {
    pid = bits(d+1,hash(t.k));
    if (pid == newp) 
        add tuple t to bucket[newp]
    else
        add tuple t to bucket[oldp]
}
sp++;
if (sp == 2^d) { d++; sp = 0; }

COMP9315 24T1 ♢ Lectures Part C ♢ [55/90]
❖ Insertion Cost

If no split required, cost same as for standard hashing:

Costinsert:    Best: 1r + 1w,    Avg: (1+Ov)r + 1w,    Worst: (1+max(Ov))r + 2w

If split occurs, incur Costinsert  plus cost of splitting:

On average, Costsplit  =  (1+Ov)r + (2+Ov)w
COMP9315 24T1 ♢ Lectures Part C ♢ [56/90]
❖ Deletion with Lin.Hashing

Deletion is similar to ordinary static hash file.

But might wish to contract file when enough tuples removed.

Rationale: r  shrinks, b  stays large wasted space.

Method:

COMP9315 24T1 ♢ Lectures Part C ♢ [57/90]
❖ Hash Files in PostgreSQL

PostgreSQL uses linear hashing on tables which have been:

create index Ix on R using hash (k);

Hash file implementation: backend/access/hash


Based on "A New Hashing Package for Unix", Margo Seltzer, Winter Usenix 1991
COMP9315 24T1 ♢ Lectures Part C ♢ [58/90]
❖ Hash Files in PostgreSQL (cont)

PostgreSQL uses slightly different file organisation ...

If overflow pages become empty, add to free list and re-use.
COMP9315 24T1 ♢ Lectures Part C ♢ [59/90]
❖ Hash Files in PostgreSQL (cont)

PostgreSQL hash file structure:

[Diagram:Pics/file-struct/pgsql-hashfile.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [60/90]
❖ Indexing

COMP9315 24T1 ♢ Lectures Part C ♢ [61/90]
❖ Indexing

An index is a file of (keyVal,tupleID) pairs, e.g.

[Diagram:Pics/file-struct/index.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [62/90]
❖ Indexes

A 1-d index is based on the value of a single attribute A.

Some possible properties of A:

Taxonomy of index types, based on properties of index attribute:

primary index on unique field, may be sorted on A
clustering index on non-unique field, file sorted on A
secondary file not sorted on A

A given table may have indexes on several attributes.

COMP9315 24T1 ♢ Lectures Part C ♢ [63/90]
❖ Indexes (cont)

Indexes themselves may be structured in several ways:

dense every tuple is referenced by an entry in the index file
sparse only some tuples are referenced by index file entries
single-level tuples are accessed directly from the index file
multi-level may need to access several index pages to reach tuple

Index file has total i pages   (where typically i ≪ b)

Index file has page capacity ci   (where typically ci ≫ c)

Dense index:  i = ceil( r/ci )     Sparse index:  i = ceil( b/ci )

COMP9315 24T1 ♢ Lectures Part C ♢ [64/90]
❖ Dense Primary Index


[Diagram:Pics/file-struct/dense-primary-index.png]


Data file unsorted; one index entry for each tuple

COMP9315 24T1 ♢ Lectures Part C ♢ [65/90]
❖ Sparse Primary Index


[Diagram:Pics/file-struct/sparse-primary-index.png]


Data file sorted; one index entry for each page

COMP9315 24T1 ♢ Lectures Part C ♢ [66/90]
❖ Exercise: Index Storage Overheads

Consider a relation with the following storage parameters:

How many pages are needed to hold a dense index?

How many pages are needed to hold a sparse index?

COMP9315 24T1 ♢ Lectures Part C ♢ [67/90]
❖ Selection with Primary Index

For one queries:

ix = binary search index for entry with key K
if nothing found { return NotFound }
P = getPage(pageOf(ix.tid))
t = getTuple(P,offsetOf(ix.tid))
   -- may require reading overflow pages
return t

Worst case:   read log2i  index pages  +  read 1+Ov data pages.

Thus, Costone,prim  =  log2 i + 1 + Ov

Assume: index pages are same size as data pages ⇒ same reading cost

COMP9315 24T1 ♢ Lectures Part C ♢ [68/90]
❖ Selection with Primary Index (cont)

For range queries on primary key:

For pmr queries involving primary key: For queries not involving primary key, index gives no help.
COMP9315 24T1 ♢ Lectures Part C ♢ [69/90]
❖ Selection with Primary Index (cont)

Method for range queries (when data file is not sorted)

// e.g. select * from R where a between lo and hi
pages = {}   results = {}
ixPage = findIndexPage(R.ixf,lo)
while (ixTup = getNextIndexTuple(R.ixf)) {
   if (ixTup.key > hi) break;
   pages = pages ∪ pageOf(ixTup.tid)
}
foreach pid in pages {
   // scan data page plus ovflow chain
   while (buf = getPage(R.datf,pid)) {
      foreach tuple T in buf {
         if (lo<=T.a && T.a<=hi)
            results = results ∪ T
}  }  }

COMP9315 24T1 ♢ Lectures Part C ♢ [70/90]
❖ Insertion with Primary Index

Overview:

tid = insert tuple into page P at position p
find location for new entry in index file
insert new index entry (k,tid) into index file 

Problem: order of index entries must be maintained

Reogranisation requires, on average, read/write half of index file:

Costinsert,prim  =  (log2i)r + i/2.(1r+1w) + (1+Ov)r + (1+δ)w

COMP9315 24T1 ♢ Lectures Part C ♢ [71/90]
❖ Deletion with Primary Index

Overview:

find tuple using index
mark tuple as deleted
delete index entry for tuple

If we delete index entries by marking ...

If we delete index entry by index file reorganisation ...
COMP9315 24T1 ♢ Lectures Part C ♢ [72/90]
❖ Clustering Index

Data file sorted; one index entry for each key value

[Diagram:Pics/file-struct/clustering-index.png]

Cost penalty: maintaining both index and data file as sorted

(Note: can't mark index entry for value X until all X tuples are deleted)

COMP9315 24T1 ♢ Lectures Part C ♢ [73/90]
❖ Secondary Index

Data file not sorted; want one index entry for each key value

[Diagram:Pics/file-struct/sec-index.png]

Costpmr  =  (log2iix1 + aix2 + bq.(1 + Ov))

COMP9315 24T1 ♢ Lectures Part C ♢ [74/90]
❖ Multi-level Indexes

Above Secondary Index used two index files to speed up search

Could improve further by

Ultimately, reduce top-level of index hierarchy to one page.

COMP9315 24T1 ♢ Lectures Part C ♢ [75/90]
❖ Multi-level Indexes (cont)

Example data file with three-levels of index:

[Diagram:Pics/file-struct/multi-level-index.png]

Assume:  not primary key,  c = 100,  ci = 3

COMP9315 24T1 ♢ Lectures Part C ♢ [76/90]
❖ Select with Multi-level Index

For one query on indexed key field:

xpid = top level index page
for level = 1 to d {
    read index entry xpid
    search index page for J'th entry
        where index[J].key <= K < index[J+1].key
    if (J == -1) { return NotFound }
    xpid = index[J].page
}
pid = xpid  // pid is data page index
search page pid and its overflow pages

Costone,mli  =  (d + 1 + Ov)r

(Note that d = ceil( logci r ) and ci is large because index entries are small)

COMP9315 24T1 ♢ Lectures Part C ♢ [77/90]
❖ B-Trees

B-trees are MSTs with the properties:

B-tree insertion and deletion methods Advantages of B-trees over general MSTs
COMP9315 24T1 ♢ Lectures Part C ♢ [78/90]
❖ B-Trees (cont)

Example B-tree (depth=3, n=3)   (actually B+ tree)

[Diagram:Pics/file-struct/btree0.png]

(Note: in DBs, nodes are pages ⇒ large branching factor, e.g. n=500)

COMP9315 24T1 ♢ Lectures Part C ♢ [79/90]
❖ B-Tree Depth

Depth depends on effective branching factor  (i.e. how full nodes are).

Simulation studies show typical B-tree nodes are 69% full.

Gives   load Li = 0.69 × ci   and   depth of tree ~ ceil( logLi r ).

Example: ci=128,    Li=88

Level #nodes #keys
root 1 87
1 88 7656
2 7744 673728
3 681472 59288064

Note: ci is generally larger than 128 for a real B-tree.

COMP9315 24T1 ♢ Lectures Part C ♢ [80/90]
❖ Insertion into B-Trees

Overview of the method:

  1. find leaf node and position in node where entry would be stored
  2. if node is not full, insert entry into appropriate spot
  3. if node is full, split node into two half-full nodes
                           and promote middle element to parent
  4. if parent full, split and promote upwards
  5. if reach root, and root is full, make new root upwards

Note: if duplicates not allowed and key exists, may stop after step 1.
COMP9315 24T1 ♢ Lectures Part C ♢ [81/90]
❖ Example: B-tree Insertion

Starting from this tree:

[Diagram:Pics/file-struct/btree1.png]

insert the following keys in the given order   12 15 30 10

COMP9315 24T1 ♢ Lectures Part C ♢ [82/90]
❖ Example: B-tree Insertion (cont)


[Diagram:Pics/file-struct/btree1a.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [83/90]
❖ Example: B-tree Insertion (cont)

[Diagram:Pics/file-struct/btree1b.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [84/90]
❖ B-Tree Insertion Cost

Insertion cost = CosttreeSearch + CosttreeInsert + CostdataInsert

Best case: write one page (most of time)

Costinsert  =  Dr + 1w + 1r + 1w

Common case: 3 node writes (rearrange 2 leaves + parent)

Costinsert  =  Dr + 3w + 1r + 1w
COMP9315 24T1 ♢ Lectures Part C ♢ [85/90]
❖ B-Tree Insertion Cost (cont)

Worst case: 2D-1 node writes (propagate to root)

Costinsert  =  Dr + (2D-1)w + 1r + 1w
COMP9315 24T1 ♢ Lectures Part C ♢ [86/90]
❖ Selection with B-Trees

For one queries:

N = B-tree root node
while (N is not a leaf node)
   N = scanToFindChild(N,K)
tid = scanToFindEntry(N,K)
access tuple T using tid

Costone  =  (D + 1)r

[Diagram:Pics/file-struct/btree00.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [87/90]
❖ Selection with B-Trees (cont)

For range queries (assume sorted on index attribute):

search index to find leaf node for Lo
for each leaf node entry until Hi found {
	access tuple T using tid from entry
}

Costrange  =  (D + bi + bq)r

[Diagram:Pics/file-struct/btree00.png]

COMP9315 24T1 ♢ Lectures Part C ♢ [88/90]
❖ B-trees in PostgreSQL

PostgreSQL implements ≅ Lehman/Yao-style B-trees

B-tree implementation: backend/access/nbtree Notes:
COMP9315 24T1 ♢ Lectures Part C ♢ [89/90]
❖ B-trees in PostgreSQL (cont)

Interface functions for B-trees

// build Btree index on relation
Datum btbuild(rel,index,...)
// insert index entry into Btree
Datum btinsert(rel,key,tupleid,index,...)
// start scan on Btree index
Datum btbeginscan(rel,key,scandesc,...)
// get next tuple in a scan
Datum btgettuple(scandesc,scandir,...)
// close down a scan
Datum btendscan(scandesc)

COMP9315 24T1 ♢ Lectures Part C ♢ [90/90]


Produced: 30 Apr 2024