Implementing Projection

COMP9315 21T1 ♢ Implementing Projection ♢ [0/15]
❖ The Projection Operation

Consider the query:

select distinct name,age from Employee;

If the Employee relation has four tuples such as:

(94002, John, Sales, Manager,   32)
(95212, Jane, Admin, Manager,   39)
(96341, John, Admin, Secretary, 32)
(91234, Jane, Admin, Secretary, 21)

then the result of the projection is:

(Jane, 21)   (Jane, 39)   (John, 32)

Note that duplicate tuples (e.g. (John,32)) are eliminated.

COMP9315 21T1 ♢ Implementing Projection ♢ [1/15]
❖ The Projection Operation (cont)

Relies on function Tuple projTuple(AttrList, Tuple)

Examples, using tuples of type (id:int, name:text, degree:int)

projTuple([id], (1234,'John',3778))
   returns (id=1234)

projTuple([name,degree]), (1234,'John',3778))
   returns (name='John',degree=3778)

COMP9315 21T1 ♢ Implementing Projection ♢ [2/15]
❖ The Projection Operation (cont)

Without distinct, projection is straightforward

// attrs = [attr1,attr2,...]
bR = nPages(Rel)
for i in 0 .. bR-1 {
   P = read page i
   for j in 0 .. nTuples(P)-1 {
      T = getTuple(P,j)
      T' = projTuple(attrs, T)
      if (outBuf is full) write and clear
      append T' to outBuf
   }
}
if (nTuples(outBuf) > 0) write

Typically,   bOutFile  <  bInFile    (same number of tuples, but tuples are smaller)

COMP9315 21T1 ♢ Implementing Projection ♢ [3/15]
❖ The Projection Operation (cont)

With distinct, the projection operation needs to:

  1. scan the entire relation as input
    • already seen how to do scanning

  2. create output tuples containing only requested attributes
    • implementation depends on tuple internal structure
    • essentially, make a new tuple with fewer attributes
      and where the values may be computed from existing attributes

  3. eliminate any duplicates produced
    • two approaches: sorting or hashing
COMP9315 21T1 ♢ Implementing Projection ♢ [4/15]
❖ Sort-based Projection

[Diagram:Pics/scansortproj/sort-proj.png]

COMP9315 21T1 ♢ Implementing Projection ♢ [5/15]
❖ Sort-based Projection (cont)

Requires a temporary file/relation .

for each tuple T in RelFile {
   T' = projTuple([attr1,attr2,...],T)
   add T' to TempFile
}

sort TempFile on [attrs]

for each tuple T in TempFile {
    if (T == Prev) continue
    write T to Result
    Prev = T
}

Reminder: "for each tuple"   means page-by-page, tuple-by-tuple

COMP9315 21T1 ♢ Implementing Projection ♢ [6/15]
❖ Cost of Sort-based Projection

The costs involved are (assuming B=n+1 buffers for sort):

Cost = sum of above = bR + bT + 2.bT.ceil(lognb0) + bT + bOut
COMP9315 21T1 ♢ Implementing Projection ♢ [7/15]
❖ Hash-based Projection

Partitioning phase:

[Diagram:Pics/scansortproj/hash-project.png]

COMP9315 21T1 ♢ Implementing Projection ♢ [8/15]
❖ Hash-based Projection (cont)

Duplicate elimination phase:

[Diagram:Pics/scansortproj//hash-project2.png]

COMP9315 21T1 ♢ Implementing Projection ♢ [9/15]
❖ Hash-based Projection (cont)

Algorithm for both phases:

for each tuple T in relation Rel {
    T' = mkTuple(attrs,T)
    H = h1(T', n)
    B = buffer for partition[H]
    if (B full) write and clear B
    insert T' into B
}
for each partition P in 0..n-1 {
    for each tuple T in partition P {
        H = h2(T, n)
        B = buffer for hash value H
        if (T not in B) insert T into B
        // assumes B never gets full
    }
    write and clear all buffers
}

Reminder: "for each tuple"   means page-by-page, tuple-by-tuple

COMP9315 21T1 ♢ Implementing Projection ♢ [10/15]
❖ Cost of Hash-based Projection

The total cost is the sum of the following:

Cost = bR + 2bP + bOut

To ensure that n is larger than the largest partition ...

COMP9315 21T1 ♢ Implementing Projection ♢ [11/15]
❖ Projection on Primary Key

No duplicates, so simple approach from above works:

bR = nPages(Rel)
for i in 0 .. bR-1 {
   P = read page i
   for j in 0 .. nTuples(P) {
      T = getTuple(P,j)
      T' = projTuple([pk], T)
      if (outBuf is full) write and clear
      append T' to outBuf
   }
}
if (nTuples(outBuf) > 0) write

COMP9315 21T1 ♢ Implementing Projection ♢ [12/15]
❖ Index-only Projection

Can do projection without accessing data file iff ...

Basic idea:

Cost analysis ...
COMP9315 21T1 ♢ Implementing Projection ♢ [13/15]
❖ Comparison of Projection Methods

Difficult to compare, since they make different assumptions:

Best case scenario for each (assuming n+1 in-memory buffers):

We normally omit bOut ... each method produces the same result
COMP9315 21T1 ♢ Implementing Projection ♢ [14/15]
❖ Projection in PostgreSQL

Code for projection forms part of execution iterators:

Types: Functions:
COMP9315 21T1 ♢ Implementing Projection ♢ [15/15]


Produced: 6 Mar 2021