❖ The Projection Operation |
Consider the query:
select distinct name,age from Employee;
If the Employee
(94002, John, Sales, Manager, 32) (95212, Jane, Admin, Manager, 39) (96341, John, Admin, Secretary, 32) (91234, Jane, Admin, Secretary, 21)
then the result of the projection is:
(Jane, 21) (Jane, 39) (John, 32)
Note that duplicate tuples (e.g. (John,32)
❖ The Projection Operation (cont) |
Relies on function Tuple projTuple(AttrList, Tuple)
(id:int, name:text, degree:int)
projTuple([id], (1234,'John',3778)) returns (id=1234) projTuple([name,degree]), (1234,'John',3778)) returns (name='John',degree=3778)
❖ The Projection Operation (cont) |
Without distinct
// attrs = [attr1,attr2,...]
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P)-1 {
T = getTuple(P,j)
T' = projTuple(attrs, T)
if (outBuf is full) write and clear
append T' to outBuf
}
}
if (nTuples(outBuf) > 0) write
Typically, bOutFile < bInFile (same number of tuples, but tuples are smaller)
❖ The Projection Operation (cont) |
With distinct
❖ Sort-based Projection (cont) |
Requires a temporary file/relation .
for each tuple T in RelFile { T' = projTuple([attr1,attr2,...],T) add T' to TempFile } sort TempFile on [attrs] for each tuple T in TempFile { if (T == Prev) continue write T to Result Prev = T }
Reminder: "for each tuple
❖ Cost of Sort-based Projection |
The costs involved are (assuming B=n+1 buffers for sort):
Rel
Temp
Temp
Temp
❖ Hash-based Projection (cont) |
Algorithm for both phases:
for each tuple T in relation Rel {
T' = mkTuple(attrs,T)
H = h1(T', n)
B = buffer for partition[H]
if (B full) write and clear B
insert T' into B
}
for each partition P in 0..n-1 {
for each tuple T in partition P {
H = h2(T, n)
B = buffer for hash value H
if (T not in B) insert T into B
// assumes B never gets full
}
write and clear all buffers
}
Reminder: "for each tuple
❖ Cost of Hash-based Projection |
The total cost is the sum of the following:
R
To ensure that n is larger than the largest partition ...
❖ Projection on Primary Key |
No duplicates, so simple approach from above works:
bR = nPages(Rel) for i in 0 .. bR-1 { P = read page i for j in 0 .. nTuples(P) { T = getTuple(P,j) T' = projTuple([pk], T) if (outBuf is full) write and clear append T' to outBuf } } if (nTuples(outBuf) > 0) write
❖ Index-only Projection |
Can do projection without accessing data file iff ...
Basic idea:
❖ Comparison of Projection Methods |
Difficult to compare, since they make different assumptions:
Best case scenario for each (assuming n+1 in-memory buffers):
❖ Projection in PostgreSQL |
Code for projection forms part of execution iterators:
ProjectionInfo { type, pi_state, pi_exprContext }
ExprState { tag, flags, resnull, resvalue, ... }
ExecProject(projInfo,...)
check_sql_fn_retval(...)
Produced: 6 Mar 2021