Scan, Sort, Project


Implementing Relational Operations


Relational Operations2/93

DBMS core = relational engine, with implementations of

In this part of the course:


... Relational Operations3/93

Implementation of relational operations in DBMS:

[Diagram:Pics/scansortproj/dbmsarch-small.png]


... Relational Operations4/93

All relational operations return a set of tuples.

Can represent a typical operation programmatically as:

ResultSet = {}  // initially an empty set
while (t = nextRelevantTuple()) {
   // format tuple according to projection
   t' = formatResultTuple(t,Projection)
   // add next relevant tuple to result set
   ResultSet = ResultSet  t'
}
return ResultSet

All of the hard work is in the nextRelevantTuple() function.


... Relational Operations5/93

nextRelevantTuple() for selection operator:

nextRelevantTuple() for join operator: Two ways to handle the ResultSet


... Relational Operations6/93

There are three "dimensions of variation" in this system:

We consider combinations of these, e.g. Also consider updates (insert/delete) on file structures.


Query Types7/93

Queries fall into a number of classes:

Type SQL RelAlg a.k.a.
Scan select * from R R -
Proj select x,y from R Proj[x,y]R -
Sort select * from R
order by x
Sort[x]R ord

Different query classes exhibit different query processing behaviours.


... Query Types8/93

Type SQL RelAlg a.k.a.
Sel1 select * from R
where id = k
Sel[id=k]R one
Seln select * from R
where a = k
Sel[a=k]R -
Selpmr select * from R
where a=j and b=k
Sel[a=j ∧ b=k]R pmr
Range1d select * from R
where a>j and a<k
Sel[a>j ∧ a<k]R rng
Rangend select * from R
where a>j and a<k
    and b>m and b<n
Sel[...]R space


... Query Types9/93

Type SQL RelAlg a.k.a.
Join1 select * from R,S
where R.id = S.r
R Join[id=r] S -
EquiJoin select * from R,S
where R.v=S.w and R.x=S.y
R Join[v=w ∧ x=y] S -
ThetaJoin select * from R,S
where R.x op S.y
R Join[...] S -
Similar select * from R
where R.* ≅ Object
R ≅ Obj sim


Cost Models


Cost Models11/93

An important aspect of this course is

Won't be using asymptotic complexity (O(n)) for this

Rather, we attempt to develop cost models

Cost is measured in terms of number of page reads/writes.


... Cost Models12/93

Assumptions in our cost models:


... Cost Models13/93

In developing cost models, we also assume:

[Diagram:Pics/scansortproj/file-struct0-small.png]


... Cost Models14/93

Typical values for measures used in cost models:

Quantity Symbol E.g. Value
total # tuples r 106
record size R 128 bytes
total # pages b 105
page size B 8192 bytes
# tuples per page c 60
page read/write time Tr,Tw 10 msec
process page in memory - ≅ 0
# pages containing
answers for query q
bq ≥ 0


... Cost Models15/93

With buffer pool, request_page() does not necessarily involve reading

Instead, we assume no buffer pool (worst-case cost analysis)

Use either readPage() or get_page() to get data

// Assume data types for Relation, Page

get_page(Relation r, int pid, Page buf)
{
    buf = readPage(r.file, pid);
}

Page readPage(File f, int pid)
{
	Page buf = newPageBuffer();
	lseek(f, pid*PAGE_SIZE, SEEK_SET);
	read(f, buf, PAGE_SIZE);
	return buf;
}


Example file structures16/93

When describing file structures

[Diagram:Pics/scansortproj/one-page-small.png]


... Example file structures17/93

Consider three simple file structures:

All files are composed of b primary blocks/pages

[Diagram:Pics/scansortproj/file-struct0-small.png]

Some records in each page may be marked as "deleted".


... Example file structures18/93

Heap file with b = 4, c = 4:

[Diagram:Pics/scansortproj/file-heap-small.png]


... Example file structures19/93

Sorted file with b = 4, c = 4:

[Diagram:Pics/scansortproj/file-sort1-small.png]


... Example file structures20/93

Hashed file with b = 3, c = 4, h(k) = k%3

[Diagram:Pics/scansortproj/file-hash-small.png]


... Example file structures21/93

Indexed file with b = 4, c = 4, bi = 2, ci = 8:

[Diagram:Pics/scansortproj/file-index-small.png]


Scanning


Scanning23/93

Consider the query:

select * from T;

Conceptually:

for each tuple t in relation T {
   add tuple t to result set
}

[Diagram:Pics/scansortproj/file-struct0-small.png]


... Scanning24/93

Implemented via iteration over file containing T:

for each page P in file of relation T {
   for each tuple t in page P {
      add tuple t to result set
   }
}

Cost: read every data page once

Cost = b.Tr


... Scanning25/93

In terms of file operations:

// implementation of "select * from T"

File inf;   // data file handle
int p;      // input file page number
Buffer buf; // input file buffer
int i;      // current record in input buf
Tuple t;    // data for current record

inf = openFile(fileName("T"), READ)
for (p = 0; p < nPages(inf); p++) {
    buf = readPage(inf,p);
    for (i = 0; i < nTuples(buf); i++) {
        t = getTuple(buf,i);
        add t to result set
}   }


... Scanning26/93

Scan implementation when file has overflow pages, e.g.

[Diagram:Pics/scansortproj/file-struct1-small.png]


... Scanning27/93

In this case, the implementation changes to:

for each page P in file of relation T {
    for each tuple t in page P {
        add tuple t to result set
    }
    for each overflow page V of page P {
        for each tuple t in page V {
            add tuple t to result set
}   }   }

Cost: read each data and overflow page once

Cost = (b + bOv).Tr

where bOv = total number of overflow pages


... Scanning28/93

In terms of file operations:

// implementation of "select * from T"

File inf;   // data file handle
File ovf;   // overflow file handle
int p;      // input file page number
int ovp;    // overflow file page number
Buffer buf; // input file buffer
int i;      // current record in input buf
Tuple t;    // data for current record

inf = openFile(fileName("T"), READ)
ovf = openFile(ovFileName("T"), READ)
for (p = 0; p < nPages(inf); p++) {
    buf = readPage(inf,p);
    for (i = 0; i < nTuples(buf); i++) {
        t = getTuple(buf,i);
        add t to result set
    }
    ovp = ovflow(buf);
    while (ovp != NO_PAGE) {
        buf = readPage(ovf,ovp);
        for (i = 0; i < nTuples(buf); i++) {
            t = getTuple(buf,i);
            add t to result set
        }
        ovp = ovflow(buf);
    }
}

Cost: read data+ovflow page    Cost = (b+bov).Tr


Selection via Scanning29/93

Consider a one query like:

select * from Employee where id = 762288;

In an unordered file, search for matching record requires:

[Diagram:Pics/scansortproj/file-search-small.png]

Guaranteed at most one answer; could be in any page.


... Selection via Scanning30/93

In terms of file operations (assuming var delcarations as before):

inf = openFile(fileName("Employee"), READ);
for (p = 0; p < nPages(inf); p++)
    buf = readPage(inf,p);
    for (i = 0; i < nTuples(buf); i++) {
        t = getTuple(buf,i);
        if (getField(t,"id") == 762288)
           return t;
}   }

For different selection condition, simply replace   (getField(t,"id")==762288)


... Selection via Scanning31/93

Cost analysis for one searching in unordered file

Assumptions: Costavg = Trb/2    Costmin = Tr    Costmax = Trb


File Copying32/93

Consider an SQL statement like:

create table T as (select * from S);

Effectively, copies data from one file to another.

[Diagram:Pics/scansortproj/file-copy-small.png]

Conceptually:

make empty relation T
for each tuple t in relation S {
    append tuple t to relation T
}


... File Copying33/93

In terms of previously defined relation/page/tuple operations:

Relation in;       // relation handle (incl. files)
Relation out;      // relation handle (incl. files)
int ipid,opid;     // input/output page indexes
int tid;           // record/tuple index on current page
Record rec;        // current record (tuple)
Page ibuf,obuf;    // input/output file buffers

in = openRelation("S", READ);
out = openRelation("T", NEW|WRITE);
clear(obuf);  opid = 0;
for (ipid = 0; ipid < nPages(in); ipid++) {
    get_page(in, ipid, ibuf);
    for (tid = 0; tid < nTuples(ibuf); tid++) {
        rec = get_record(ibuf, tid);
        if (!hasSpace(obuf,rec)) {
            put_page(out, opid++, obuf);
            clear(obuf);
        }
        insert_record(obuf,rec);
}   }
if (nTuples(obuf) > 0) put_page(out, opid, obuf);


Exercise 1: Cost of Relation Copy34/93

Analyse cost for relation copying:

  1. if both input and output are heap files
  2. if input is sorted and output is heap file
  3. if input is heap file and output is sorted
Assume ...

Give cost in terms of #pages read + #pages written


Iterators35/93

Higher-levels of DBMS are given a view of scanning as:

cursor = initScan(relName,condition);
while (tup = getNextTuple(cursor)) {
	process tup
}
endScan(cursor);

Also known as iterator.


... Iterators36/93

Implementation of simple scan iterator (via file operations):

typedef struct {
    File   inf;   // data file handle
    Buffer buf;   // input buffer
    int    curp;  // current page number
    int    curi;  // current record number
    Expr   cond;  // representation of condition
} Cursor;


... Iterators37/93

Implementation of simple scan iterator (continued):

Cursor *initScan(char *rel, char *cond)
{
    Cursor *c;
    c = malloc(sizeof(Cursor));
    c->inf = openFile(fileName(rel),READ);
    c->buf = readPage(c->inf,0);
    c->curp = 0;
    c->curi = 0;
    c->cond = makeTestableCondition(cond);
    return c;
}
void endScan(Course *c)
{
    closeFile(c->inf);
    freeExpr(c->cond);
    free(c);
}


... Iterators38/93

Implementation of simple scan iterator (continued):

Tuple getNextTuple(Cursor *c)
{
getNextTuple:
    if (c->curi < nTuples(c->buf))
        return getTuple(c->buf, c->curi++);
    else {
        // no more tuples in this page; get next page
        c->curp++;
        if (c->curp == nPages(c->inf))
            return NULL;  // no more pages
        else {
            c->buf = readPage(c->inf,c->curp);
            c->curi = 0;
            goto getNextTuple;
        }
    }
}


... Iterators39/93

Implementation of full iterator interface via file operations:

typedef struct {
    File   inf;   // data file handle
    File   ovf;   // overflow file handle
    Buffer buf;   // input buffer
    int    curp;  // current page number
    int    curop; // current ovflow page number
    int    curi;  // current record number
    Expr   cond;  // representation of condition
} Cursor;


... Iterators40/93

Implementation of full iterator interface (continued):

Cursor *initScan(char *rel, char *cond)
{
    Cursor *c;

    c = malloc(sizeof(Cursor));
    c->inf = openFile(fileName(rel),READ);
    c->ovf = openFile(ovFileName(rel),READ);
    c->buf = readPage(c->inf,0);
    c->curp = 0;
    c->curop = NO_PAGE;
    c->curi = 0;
    c->cond = makeTestableCondition(cond)
    return c;
}
void endScan(Course *c)
{
    closeFile(c->inf);
    if (c->ovf) closeFile(c->ovf);
    freeExpr(c->cond);
    free(c);
}


... Iterators41/93

Implementation of scanning interface (continued):


Tuple getNextTuple(Cursor *c)
{
getNextTuple:
    if (c->curi < nTuples(c->buf))
        return getTuple(c->buf, c->curi++);
    else {
        // no more tuples in this page; get next page
        if (c->curop == NO_PAGE) {
            c->curop = ovflow(c->buf);
            if (c->curop != NO_PAGE) {
                // start ovflow chain scan
getNextOvPage:
                c->buf = readPage(c->ovf,c->curop);
                c->curi = 0;
                goto getNextTuple;
            }
            else {
getNextDataPage:
                c->curp++;
                if (c->curp == nPages(c->inf))
                    return NULL;  // no more pages
                else {
                    c->buf = readPage(c->inf,c->curp);
                    c->curi = 0;
                    goto getNextTuple;
                }
            }
        }
        else {
            // continue ovflow chain scan
            c->curop = ovflow(c->buf);
            if (c->curop == NO_PAGE)
                goto getNextDataPage;
            else
                goto getNextOvPage;
        }
    }
}


Scanning in PostgreSQL42/93

Scanning defined in: /backend/access/heap/heapam.c

Implements iterator data/operations:


... Scanning in PostgreSQL43/93

typedef struct HeapScanDescData
{
  // scan parameters 
  Relation      rs_rd;        // heap relation descriptor 
  Snapshot      rs_snapshot;  // snapshot ... tuple visibility 
  int           rs_nkeys;     // number of scan keys 
  ScanKey       rs_key;       // array of scan key descriptors 
  ...
  // state set up at initscan time 
  PageNumber    rs_npages;    // number of pages to scan 
  PageNumber    rs_startpage; // page # to start at 
  ...
  // scan current state, initally set to invalid 
  HeapTupleData rs_ctup;      // current tuple in scan
  PageNumber    rs_cpage;     // current page # in scan
  Buffer        rs_cbuf;      // current buffer in scan
   ...
} HeapScanDescData;


Scanning in other File Structures44/93

Above examples are for heap files

Other access file structures in PostgreSQL:


Sorting


The Sort Operation46/93

Sorting is explicit in queries only in the order by clause

select * from Students order by name;

More important, sorting is used internally in other operations:


External Sorting47/93

Sort methods such as quicksort are designed for in-memory data.

For data on disks, need external sorting techniques.

The standard external sorting method (merge sort) works by


... External Sorting48/93

Sorting tuples within pages

[Diagram:Pics/scansortproj/sorted-page-small.png]


Two-way Merge Sort49/93

Requires three in-memory buffers:

[Diagram:Pics/scansortproj/two-way-buf-small.png]


Assumption: cost of merge on two buffers ≅ 0.


... Two-way Merge Sort50/93

Two-way merge-sort method:

read each page into buffer, sort it, write it
numberOfRuns = b; runLength = 1;
while (numberOfRuns > 1) {
    for each pair of adjacent runs {
        merge the pair of runs to output, by
          - read pages from runs into input
              buffers, one page at a time
          - apply merge algorithm to transfer
              tuples to output buffer
          - flush output buffer when full and
              when merge finished
   }
   numberOfRuns = numberOfRuns / 2
   runLength = runLength * 2
}


... Two-way Merge Sort51/93

Example:

[Diagram:Pics/scansortproj/two-way-ex-small.png]


... Two-way Merge Sort52/93

Two-way merge-sort method (improved):

numberOfRuns = b; runLength = 1;
while (numberOfRuns > 1) {
    for each pair of adjacent runs {
        merge the pair of runs to output, by
          - read pages from runs into input
              buffers, one page at a time
          - if (runLength == 1)
              sort contents of each input buffer
          - apply merge algorithm to transfer
              tuples to output buffer
          - flush output buffer when full and
              when merge finished
   }
   numberOfRuns = numberOfRuns / 2
   runLength = runLength * 2
}

Avoids first pass to sort contents of individual pages.


... Two-way Merge Sort53/93

Consider file where b = 2k:

Method also works ok when


... Two-way Merge Sort54/93

Example:

[Diagram:Pics/scansortproj/two-way-ex2-small.png]


Merging Two Sorted Pages55/93

Method using operations on files and buffers:


// Pre:  buffers B1,B2; outfile position op
// Post: tuples from B1,B2 output in order
i1 = i2 = 0; clear(Out);
R1 = getTuple(B1,i1); R2 = getTuple(B2,i2);
while (i1 < nTuples(B1) && i2 < nTuples(B2)) {
    if (lessThan(R1,R2))
        { addTuple(R1,Out); i1++; R1 = getTuple(B1,i1); }
    else
        { addTuple(R2,Out); i2++; R2 = getTuple(B2,i2); }
    if (isFull(Out))
        { writePage(outf,op++,Out); clear(Out); }
}
for (i1=i1; i1 < nTuples(B1); i1++) {
    addTuple(getTuple(B1,i1), Out);
    if (isFull(Out))
        { writePage(outf,op++,Out); clear(Out); }
}
for (i2=i2; i2 < nTuples(B2); i2++) {
    addTuple(getTuple(B2,i2), Out);
    if (isFull(Out))
        { writePage(outf,op++,Out); clear(Out); }
}
if (nTuples(Out) > 0) writePage(outf,op,Out);


Merging Runs vs Merging Pages56/93

In the above, we merged two input buffers.

In general, we need to merge sorted "runs" of pages.

The only difference that this makes to the above method:

R1 = getTuple(B1,i1);

becomes

if (i1 == nTuples(B1)) {
    B1 = readPage(inf,ip++); i1 = 0;
}
R1 = getTuple(B1,i1);


Comparison for Sorting57/93

Above assumes that we have a function to compare tuples.

Mechanism needs to be generic, to handle all of:

select * from Employee order by eid;
select * from Employee order by name;
select * from Employee order by age;

Envisage a function tupCompare(r1,r2,f) (cf. C's strcmp)


... Comparison for Sorting58/93

In reality, need to sort on multiple attributes and ASC/DESC, e.g.

-- example multi-attribute sort
select * from Students
order by age desc, year_enrolled

Sketch of multi-attribute sorting function

int tupCompare(r1,r2,criteria)
{
   foreach (f,ord) in criteria {
      if (ord == ASC) {
         if (r1.f < r2.f) return -1;
         if (r1.f > r2.f) return 1;
      }
      else {
         if (r1.f > r2.f) return -1;
         if (r1.f < r2.f) return 1;
      }
   }
   return 0;
}


Cost of Two-way Merge Sort59/93

For a file containing b data pages:

Gives total cost:   2.b.log2b

Example: Relation with r=105 and c=50     b=2000 pages.

Number of passes for sort:   log22000  =  11

Reads/writes entire file 11 times!    Can we do better?


n-Way Merge Sort60/93

Initial pass uses: B total buffers

[Diagram:Pics/scansortproj/n-way-buf-pass0-small.png]


... n-Way Merge Sort61/93

Merge passes use:   n input buffers,   1 output buffer

[Diagram:Pics/scansortproj/n-way-buf-small.png]


... n-Way Merge Sort62/93

Method:


// Produce B-page-long runs
for each group of B pages in Rel {
    read pages into memory buffers
    sort group in memory
    write pages out to Temp
}
// Merge runs until everything sorted
// n-way merge, where n=B-1
numberOfRuns = b/B
while (numberOfRuns > 1) {
    for each group of n runs in Temp {
        merge into a single run via input buffers
        write run to newTemp via output buffer
    }
    numberOfRuns = numberOfRuns/n
    Temp = newTemp // swap input/output files
}


... n-Way Merge Sort63/93

Method for merging n runs (n input buffers, 1 output buffer):


for i = 1..n {
   read first page of run[i] into a buffer[i]
   set current tuple cur[i] to first tuple in buffer[i]
}
while (more than 1 run still has tuples) {
   s = find buffer with smallest tuple as cur[i]
   copy tuple cur[i] to output buffer
   if (output buffer full) { write it and clear it}
   advance cur[i] to next tuple
   if (no more tuples in buffer[i]) {
      if (no more pages in run[i])
         mark run[i] as complete
      else {
         read next page of run[i] into buffer[i]
         set cur[i] to first tuple in buffer[i]
}  }  }
copy tuples in non-empty buffer to output


Cost of n-Way Merge Sort64/93

Consider file where b = 4096, B = 16 total buffers:

(cf. two-way merge sort which needs 11 passes)


... Cost of n-Way Merge Sort65/93

Generalising from previous example ...

For b data pages and B buffers


Cost = 2.b.(1 + lognb0),   where b0 = b/B


... Cost of n-Way Merge Sort66/93

Costs (number of passes) for varying b and B (n=B-1):

b B=3 B=16 B=128
100 7 2 1
1000 10 3 2
10,00 13 4 2
100,000 17 5 3
1,000,000 20 5 3
 

In the above, we assume that

Elapsed time could be reduced by double-buffering


Sorting in PostgreSQL67/93

Sort uses a merge-sort (from Knuth) similar to above:

Tuples are mapped to SortTuple structs for sorting: If all data fits into memory, sort using qsort().

If memory fills while reading, form "runs" and do disk-based sort.


... Sorting in PostgreSQL68/93

Disk-based sort has phases:

Many references to "tapes" since Knuth's original algorithm was described in terms of merging data from magnetic tapes.

Effectively, a "tape" is a sorted run.

Implementation of "tapes": backend/utils/sort/logtape.c


... Sorting in PostgreSQL69/93

Sorting comparison operators are obtained via catalog (in Type.o):

// gets pointer to function via pg_operator
struct Tuplesortstate { ... SortTupleComparator ... };

// returns negative, zero, positive
ApplySortComparator(Datum datum1, bool isnull1,
                    Datum datum2, bool isnull2,
                    SortSupport sort_helper);

Flags indicate: ascending/descending, nulls-first/last.

ApplySortComparator() is PostgreSQL's version of tupCompare()


Implementing Projection


The Projection Operation71/93

Consider the query:

select distinct name,age from Employee;

If the Employee relation has four tuples such as:

(94002, John, Sales, Manager,   32)
(95212, Jane, Admin, Manager,   39)
(96341, John, Admin, Secretary, 32)
(91234, Jane, Admin, Secretary, 21)

then the result of the projection is:

(Jane, 21)   (Jane, 39)   (John, 32)

Note that duplicate tuples (e.g. (John,32)) are eliminated.


... The Projection Operation72/93

The projection operation needs to:

  1. scan the entire relation as input

    (straightforward, whichever file organisation is used)

  2. remove unwanted attributes in output

    (straightforward, manipulating internal record structure)

  3. eliminate any duplicates produced

    (not as simple as other operations ...)

There are two approaches for task 3: sorting or hashing.


Removing Attributes73/93

Projecting attributes involves creating a new tuple, using only some values from the original tuple.

Precisely how to achieve this depends on tuple internals.

Removing attributes from fixed-length tuples:

[Diagram:Pics/scansortproj/fixed-len-small.png]


... Removing Attributes74/93

Removing attributes from variable-length tuples:

[Diagram:Pics/scansortproj/var-len-small.png]


Sort-based Projection75/93

Overview of the method:

  1. Scan input relation Rel and produce a file of tuples containing only the projected attributes
  2. Sort this file of tuples using the combination of all attributes as the sort key
  3. Scan the sorted result, comparing adjacent tuples, and discard duplicates
Requires a temporary file/relation (Temp)


... Sort-based Projection76/93

The method, in detail:

// Inputs: relName, attrList
inf = openFile(fileName(relName),READ);
tempf = openFile(tmpName,CREATE);
clear(outbuf); j = 0;
for (p = 0; p < nPages(inf); p++) {
    buf = readPage(inf,p);
    for (i = 0; i < nTuples(buf); i++) {
        tup = getTuple(buf,i);
        newtup = project(tup,attrList);
        addTuple(newtup,outbuf);
        if (isFull(outbuf)) {
            writePage(tempf,j++,outbuf);
            clear(outbuf);
        }
    }
}
mergeSort(tempf);

(continued ...)


... Sort-based Projection77/93

(... continued)

tempf = openFile(tmpName,READ);
outf = openFile(result,CREATE);
clear(outbuf); prev = EMPTY; j = 0;
for (p = 0; p < nPages(tempf); p++) {
    buf = readPage(tempf,p);
    for (i = 0; i < nTuples(buf); i++) {
        tup = getTuple(buf,i);
        if (tupCompare(tup,prev) != 0) {
            addTuple(tup,outbuf);
            if (isFull(outbuf)) {
                writePage(outf,j++,outbuf);
                clear(outbuf);
            }
            prev = tup;
        }
    }
}


Cost of Sort-based Projection78/93

The costs involved are (assuming B=n+1 buffers for sort):

Total cost = sum of above = bR + 2.bT + 2.bT(1 + logBb0 ) + bOut

Note that we often ignore cost of writing the result; especially when comparing different algorithms for the same relational operation.


Improving Sort-based Projection79/93

Some approaches for improving the cost:


Hash-based Projection80/93

Overview of the method:

  1. Scan input relation Rel and produce a set of hash partitions based on the projected attributes
  2. Scan each hash partition looking for duplicates
  3. Once each partition is duplicate-free, write out the remaining tuples
The method requires:


Hash Functions81/93

Hash function h(tuple,range):

Implementation issues for hash functions:


... Hash Functions82/93

Usual approach in hash function:

Example hash function for character strings:


unsigned int hash(char *val, int b)
{
    char *cp;
    unsigned int v, sum = 0;
    for (c = val; *c != '\0'; c++) {
        v = *c + (*(c+1) << 8);
        sum += (sum + 2153*v) % 19937;
    }
    return(sum % b);
}


Hash-based Projection83/93

Partitioning phase:

[Diagram:Pics/scansortproj/hash-project-small.png]


... Hash-based Projection84/93

Algorithm for partitioning phase:

for each page P in relation Rel {
    for each tuple t in page P {
        t' = project(t, attrList)
        H = h1(t', B-1)
        write t' to partition[H]
}   }

Each partition could be implemented as a simple data file.


... Hash-based Projection85/93

Duplicate elimination phase:

[Diagram:Pics/scansortproj/hash-project2-small.png]


... Hash-based Projection86/93

Algorithm for duplicate elimination phase:

for each partition P in 0..B-2 {
    for each tuple t in partition P {
        H = h2(t, B-1)
        if (!(t occurs in buffer[H]))
            append t to buffer H
    }
    output contents of all buffers
    clear all buffers
}


Cost of Hash-based Projection87/93

The total cost is the sum of the following:

To ensure that B is larger than the largest partition ...


... Cost of Hash-based Projection88/93

If the largest partition had more than B-1 pages

This would potentially increase the cost by a large amount
(worst case is one additional page read for every record after hash bucket fills)


Index-only Projection89/93

Under the conditions:

can do projection without accessing data file.

Basic idea:


... Index-only Projection90/93

Method:

for each entry I in index file {
    tup = project(I.key, attrList)
    if (tupCompare(tup,prev) != 0) {
        addTuple(outbuf,tup)
        if (isFull(outbuf)) {
            writePage(outf,op++,outbuf);
            clear(outbuf);
        }
        prev = tup;
}   }

"for each index entry": loop over index pages and loop over entries in each page


Cost of Index-only Projection91/93

Assume that the index (see details later):

Costs involved in index-only projection: Total cost:   bi + bOut   ≪   bR + bOut


Comparison of Projection Methods92/93

Difficult to compare, since they make different assumptions:

Best case scenario for each (assuming B+1 in-memory buffers):


Projection in PostgreSQL93/93

Code for projection forms part of execution iterators:

Functions involved with projection:


Produced: 24 Jun 2019