❖ The Projection Operation |
Consider the query:
select distinct name,age from Employee;
If the Employee
(94002, John, Sales, Manager, 32) (95212, Jane, Admin, Manager, 39) (96341, John, Admin, Secretary, 32) (91234, Jane, Admin, Secretary, 21)
then the result of the projection is:
(Jane, 21) (Jane, 39) (John, 32)
Note that duplicate tuples (e.g. (John,32)
❖ The Projection Operation (cont) |
Relies on function Tuple projTuple(AttrList, Tuple)
(id:int, name:text, degree:int)projTuple([id], (1234,'John',3778)) returns (id=1234) projTuple([name,degree]), (1234,'John',3778)) returns (name='John',degree=3778)
❖ The Projection Operation (cont) |
Without distinct
// attrs = [attr1,attr2,...]
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P)-1 {
T = getTuple(P,j)
T' = projTuple(attrs, T)
if (outBuf is full) write and clear
append T' to outBuf
}
}
if (nTuples(outBuf) > 0) write
Typically, bOutFile < bInFile (same number of tuples, but tuples are smaller)
❖ The Projection Operation (cont) |
With distinct
❖ Sort-based Projection (cont) |
Requires a temporary file/relation .
for each tuple T in RelFile {
T' = projTuple([attr1,attr2,...],T)
add T' to TempFile
}
sort TempFile on [attrs]
for each tuple T in TempFile {
if (T == Prev) continue
write T to Result
Prev = T
}
Reminder: "for each tuple
❖ Cost of Sort-based Projection |
The costs involved are (assuming B=n+1 buffers for sort):
RelTempTempTemp❖ Hash-based Projection (cont) |
Algorithm for both phases:
for each tuple T in relation Rel {
T' = mkTuple(attrs,T)
H = h1(T', n)
B = buffer for partition[H]
if (B full) write and clear B
insert T' into B
}
for each partition P in 0..n-1 {
for each tuple T in partition P {
H = h2(T, n)
B = buffer for hash value H
if (T not in B) insert T into B
// assumes B never gets full
}
write and clear all buffers
}
Reminder: "for each tuple
❖ Cost of Hash-based Projection |
The total cost is the sum of the following:
RTo ensure that n is larger than the largest partition ...
❖ Projection on Primary Key |
No duplicates, so simple approach from above works:
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P) {
T = getTuple(P,j)
T' = projTuple([pk], T)
if (outBuf is full) write and clear
append T' to outBuf
}
}
if (nTuples(outBuf) > 0) write
❖ Index-only Projection |
Can do projection without accessing data file iff ...
Basic idea:
❖ Comparison of Projection Methods |
Difficult to compare, since they make different assumptions:
Best case scenario for each (assuming n+1 in-memory buffers):
❖ Projection in PostgreSQL |
Code for projection forms part of execution iterators:
ProjectionInfo { type, pi_state, pi_exprContext }ExprState { tag, flags, resnull, resvalue, ... }ExecProject(projInfo,...)check_sql_fn_retval(...)Produced: 6 Mar 2021