COMP9315 19T2 Quiz 5Sample Solutions DBMS Implementation

1. How many tuples from query select * from Parts where size = 'huge'; ?

• There are 5 distinct values for size, and a uniform distribution of values.
• So each value for size should appear in around 20% of the tuples.
• Since there are 6507 tuples, number of huge tuples = 6507/5 = 1301

2. For the query
select a,b,c,d,e,f from R, S, T where R.id = S.rid and T.id = S.tid
which is not a valid join order?

• any join that involves R ⋈ S or S ⋈ T, in either order, is ok
• the schema has no direct conncetion between R and T
• any join order that includes R ⋈ T is therefore incorrect
• only the following has this problem: S ⋈ (T ⋈ R)

3. Number of pages read for block nested loop join with 35 buffers.

• b_R = 1000 and b_S = 100 and B = 35
• join requires 1 input buffer for inner relation and 1 output buffer
• leaving B-2 buffers for hold blocks of the outer relation
• cost = #outerPages + #outerBlocks * #innerPages
• if R is the outer relation, #outerBlocks = ceil(b_R/(B-2)) = ceil(1000/33) = 31
• cost with R as outer relation = 1000 + 31*100 = 4100
• if S is the outer relation, #outerBlocks = ceil(b_S/(B-2)) = ceil(100/33) = 4
• cost with S as outer relation = 100 + 4*1000 = 4100

4. Cost of hybrid hash join on R&bowties;S, with 17 memory buffers, b_R = 60, b_S = 60, ...

• there are 8 partitions in total, one partition is held in memory
• phase1: partition R
• scan R (60 page reads), applying hash function to partition tuples
• partition 0 is held in disk, patritions 1-7 written to disk
• since each partition is 8 pages, 56 pages written to disk
• phase2: partition S
• scan S (40 page reads), applying same hash function to each tuple
• any tuple in partition 0 is matched with tuples in memory
• other partitions (1-7) are written to disk
• each disk parition in 5 pages long, so 35 pages written to disk
• phase 3: match remaining tuples
• read each of R's partitions 1-7 into 8 memory buffers, one by one
• for each partition, read corresponding S parition
• form join matches, and write to disk via output buffer
• pages read in this phase = 56 + 35 = 91 reads
• cost = phase1 cost + phase2 cost + phase3 cost + output cost
• phase1 cost = 60 reads + 56 writes
• phase2 cost = 40 reads + 35 writes (ignoring joined tuple output)
• phase3 cost = 56 reads + 35 reads (ignoring joined tuple output)
• output cost = 30 writes (occurs over phase2+phase3, given in question)
• total cost = 60 + 56 + 40 + 35 + 56 + 35 + 30 = 312