How many tuples from query select * from Parts where size = 'huge'; ?
- There are 5 distinct values for size, and a uniform distribution of values.
- So each value for size should appear in around 20% of the tuples.
- Since there are 6507 tuples, number of huge tuples = 6507/5 = 1301
For the query
select a,b,c,d,e,f from R, S, T
where R.id = S.rid and T.id = S.tid
which is not a valid join order?
- any join that involves R ⋈ S or S ⋈ T, in either order, is ok
- the schema has no direct conncetion between R and T
- any join order that includes R ⋈ T is therefore incorrect
- only the following has this problem: S ⋈ (T ⋈ R)
Number of pages read for block nested loop join with 35 buffers.
- b_R = 1000 and b_S = 100 and B = 35
- join requires 1 input buffer for inner relation and 1 output buffer
- leaving B-2 buffers for hold blocks of the outer relation
- cost = #outerPages + #outerBlocks * #innerPages
- if R is the outer relation, #outerBlocks = ceil(b_R/(B-2)) = ceil(1000/33) = 31
- cost with R as outer relation = 1000 + 31*100 = 4100
- if S is the outer relation, #outerBlocks = ceil(b_S/(B-2)) = ceil(100/33) = 4
- cost with S as outer relation = 100 + 4*1000 = 4100
Cost of hybrid hash join on R&bowties;S, with 17 memory buffers,
b_R = 60, b_S = 60, ...
- there are 8 partitions in total, one partition is held in memory
- phase1: partition R
- scan R (60 page reads), applying hash function to partition tuples
- partition 0 is held in disk, patritions 1-7 written to disk
- since each partition is 8 pages, 56 pages written to disk
- phase2: partition S
- scan S (40 page reads), applying same hash function to each tuple
- any tuple in partition 0 is matched with tuples in memory
- other partitions (1-7) are written to disk
- each disk parition in 5 pages long, so 35 pages written to disk
- phase 3: match remaining tuples
- read each of R's partitions 1-7 into 8 memory buffers, one by one
- for each partition, read corresponding S parition
- form join matches, and write to disk via output buffer
- pages read in this phase = 56 + 35 = 91 reads
- cost = phase1 cost + phase2 cost + phase3 cost + output cost
- phase1 cost = 60 reads + 56 writes
- phase2 cost = 40 reads + 35 writes (ignoring joined tuple output)
- phase3 cost = 56 reads + 35 reads (ignoring joined tuple output)
- output cost = 30 writes (occurs over phase2+phase3, given in question)
- total cost = 60 + 56 + 40 + 35 + 56 + 35 + 30 = 312