Indexing

COMP9315 23T1 ♢ Indexing ♢ [0/15]
❖ Indexing

An index is a file of (keyVal,tupleID) pairs, e.g.

[Diagram:Pics/file-struct/index.png]

COMP9315 23T1 ♢ Indexing ♢ [1/15]
❖ Indexes

A 1-d index is based on the value of a single attribute A.

Some possible properties of A:

Taxonomy of index types, based on properties of index attribute:

primary index on unique field, may be sorted on A
clustering index on non-unique field, file sorted on A
secondary file not sorted on A

A given table may have indexes on several attributes.

COMP9315 23T1 ♢ Indexing ♢ [2/15]
❖ Indexes (cont)

Indexes themselves may be structured in several ways:

dense every tuple is referenced by an entry in the index file
sparse only some tuples are referenced by index file entries
single-level tuples are accessed directly from the index file
multi-level may need to access several index pages to reach tuple

Index file has total i pages   (where typically i ≪ b)

Index file has page capacity ci   (where typically ci ≫ c)

Dense index:  i = ceil( r/ci )     Sparse index:  i = ceil( b/ci )

COMP9315 23T1 ♢ Indexing ♢ [3/15]
❖ Dense Primary Index

Data file unsorted; one index entry for each tuple


[Diagram:Pics/file-struct/dense-primary-index.png]

COMP9315 23T1 ♢ Indexing ♢ [4/15]
❖ Sparse Primary Index

Data file sorted; one index entry for each page


[Diagram:Pics/file-struct/sparse-primary-index.png]

COMP9315 23T1 ♢ Indexing ♢ [5/15]
❖ Selection with Primary Index

For one queries:

ix = binary search index for entry with key K
if nothing found { return NotFound }
b = getPage(pageOf(ix.tid))
t = getTuple(b,offsetOf(ix.tid))
   -- may require reading overflow pages
return t

Worst case:   read log2i  index pages  +  read 1+Ov data pages.

Thus, Costone,prim  =  log2 i + 1 + Ov

Assume: index pages are same size as data pages ⇒ same reading cost

COMP9315 23T1 ♢ Indexing ♢ [6/15]
❖ Selection with Primary Index (cont)

For range queries on primary key:

For pmr queries involving primary key: For queries not involving primary key, index gives no help.
COMP9315 23T1 ♢ Indexing ♢ [7/15]
❖ Selection with Primary Index (cont)

Method for range queries (when data file is not sorted)

// e.g. select * from R where a between lo and hi
pages = {}   results = {}
ixPage = findIndexPage(R.ixf,lo)
while (ixTup = getNextIndexTuple(R.ixf)) {
   if (ixTup.key > hi) break;
   pages = pages ∪ pageOf(ixTup.tid)
}
foreach pid in pages {
   // scan data page plus ovflow chain
   while (buf = getPage(R.datf,pid)) {
      foreach tuple T in buf {
         if (lo<=T.a && T.a<=hi)
            results = results ∪ T
}  }  }

COMP9315 23T1 ♢ Indexing ♢ [8/15]
❖ Insertion with Primary Index

Overview:

tid = insert tuple into page P at position p
find location for new entry in index file
insert new index entry (k,tid) into index file 

Problem: order of index entries must be maintained

Reorganisation requires, on average, read/write half of index file:

Costinsert,prim  =  (log2i)r + i/2.(1r+1w) + (1+Ov)r + (1+δ)w

COMP9315 23T1 ♢ Indexing ♢ [9/15]
❖ Deletion with Primary Index

Overview:

find tuple using index
mark tuple as deleted
delete index entry for tuple

If we delete index entries by marking ...

If we delete index entry by index file reorganisation ...
COMP9315 23T1 ♢ Indexing ♢ [10/15]
❖ Clustering Index

Data file sorted; one index entry for each key value

[Diagram:Pics/file-struct/clustering-index.png]

Cost penalty: maintaining both index and data file as sorted

(Note: can't mark index entry for value X until all X tuples are deleted)

COMP9315 23T1 ♢ Indexing ♢ [11/15]
❖ Secondary Index

Data file not sorted; want one index entry for each key value

[Diagram:Pics/file-struct/sec-index.png]

Costpmr  =  (log2iix1 + aix2 + bq.(1 + Ov))

COMP9315 23T1 ♢ Indexing ♢ [12/15]
❖ Multi-level Indexes

Secondary Index used two index files to speed up search

Could improve further by

Ultimately, reduce top-level of index hierarchy to one page.

COMP9315 23T1 ♢ Indexing ♢ [13/15]
❖ Multi-level Indexes (cont)

Example data file with three-levels of index:

[Diagram:Pics/file-struct/multi-level-index.png]

Assume:  not primary key,   c = 20,   ci = 3


In reality, more likely   c = 100,   ci = 1000

COMP9315 23T1 ♢ Indexing ♢ [14/15]
❖ Select with Multi-level Index

For one query on indexed key field:

xpid = top level index page
for level = 1 to d {
    read index entry xpid
    search index page for J'th entry
        where index[J].key <= K < index[J+1].key
    if (J == -1) { return NotFound }
    xpid = index[J].page
}
pid = xpid  // pid is data page index
search page pid and its overflow pages

Costone,mli  =  (d + 1 + Ov)r

(Note that d = ceil( logci r ) and ci is large because index entries are small)

COMP9315 23T1 ♢ Indexing ♢ [15/15]


Produced: 15 Mar 2023