[prev] 29 [next]

Exercise 3: Cost of Hash-based Projection

Consider a table R(x,y,z) with tuples:

Page 0:  (1,1,'a')   (11,2,'a')  (3,3,'c')
Page 1:  (13,5,'c')  (2,6,'b')   (9,4,'a')
Page 2:  (6,2,'a')   (17,7,'a')  (7,3,'b')
Page 3:  (14,6,'a')  (8,4,'c')   (5,2,'b')
Page 4:  (10,1,'b')  (15,5,'b')  (12,6,'b')
Page 5:  (4,2,'a')   (16,9,'c')  (18,8,'c')
-- and then the same tuples repeated for pages 6-11

SQL:   create T as (select distinct y from R)

Assuming:

  • 4 memory buffers, one for input, 3 for partitioning
  • pages/buffers hold 3 R tuples (i.e. cR=3), 4 T tuples (i.e. cT=4)
  • hash functions:   h1(x) = x%3,   h2(x) = (x%4)%3
Show how hash-based projection would execute this statement.