COMP9315 24T1 Exercises 07
Implementing Join: Nested-loop, Sort-merge, Hash
DBMS Implementation

  1. Does the buffer replacement policy matter for sort-merge join? Explain your answer.


  2. Suppose that the relation R (with 150 pages) consists of one attribute a and S (with 90 pages) also consists of one attribute a. Determine the optimal join method for processing the following query:

    select * from R, S where R.a > S.a
    

    Assume there are 10 buffer pages available to process the query and there are no indexes available. Assume also that the DBMS only has available the following join methods: nested-loop, block nested loop and sort-merge. Determine the number of page I/Os required by each method to work out which is the cheapest.


  3. [Ramakrishnan, exercise 12.4] Consider the join Join[R.a=S.b](R,S) and the following information about the relations to be joined:

    • Relation R contains 10,000 tuples and has 10 tuples per page
    • Relation S contains 2,000 tuples and also has 10 tuples per page
    • Attribute b of relation S is the primary key for S
    • Both of the relations are stored as simple heap files
    • Neither relation has any indexes built on it
    • There are 52 buffer pages available

    Unless otherwise noted, compute all costs as number of page I/Os, except that the cost of writing out the result should be uniformly ignored (it's the same for all methods).

    1. What is the cost of joining R and S using page-oriented simple nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

    2. What is the cost of joining R and S using block nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

    3. What is the cost of joining R and S using sort-merge join? What is the minimum number of buffer pages required for this cost to remain unchanged?

    4. What is the cost of joining R and S using grace hash join? What is the minimum number of buffer pages required for this cost to remain unchanged?

    5. What would be the lowest possible I/O cost for joining R and S using any join algorithm? How much buffer space would be needed to achieve this cost?

    6. What is the maximum number of tuples that the join of R and S could produce, and how many pages would be required to store this result?

    7. How would your answers to the above questions change if you are told that R.a is a foreign key that refers to S.b?


  4. [Ramakrishnan, exercise 12.5] Consider the join of R and S described in the previous question:

    1. With 52 buffer pages, if unclustered B+ tree indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (e.g. using index nested loop join) than a block nested loops join? Explain.

      1. Would your answer change if only 5 buffer pages were available?
      2. Would your answer change if S contained only 10 tuples instead of 2,000 tuples?

    2. With 52 buffer pages, if clustered B+ tree indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (e.g. using index nested loop join) than a block nested loops join? Explain.

      1. Would your answer change if only 5 buffer pages were available?
      2. Would your answer change if S contained only 10 tuples instead of 2,000 tuples?

    3. If only 15 buffers were available, what would be the cost of sort-merge join? What would be the cost of hash join?

    4. If the size of S were increased to also be 10,000 tuples, but only 15 buffer pages were available, what would be the cost for sort-merge join? What would be the cost of hash join?

    5. If the size of S were increased to also be 10,000 tuples, and 52 buffer pages were available, what would be the cost for sort-merge join? What would be the cost of hash join?


  5. Consider performing the join:

    select * from R, S where R.x = S.y

    where we have bR = 1000, bS = 5000 and where R is used as the outer relation.

    Compute the page I/O cost of performing this join using hybrid hash join:

    1. if we have N=100 memory buffers available

    2. if we have N=512 memory buffers available

    3. if we have N=1024 memory buffers available