COMP9315 24T1

Exercises 07
Implementing Join: Nested-loop, Sort-merge, Hash

DBMS Implementation

Does the buffer replacement policy matter for sort-merge join? Explain your answer.

Suppose that the relation R (with 150 pages) consists of one attribute a and S (with 90 pages) also consists of one attribute a. Determine the optimal join method for processing the following query:
```
select * from R, S where R.a > S.a
```
Assume there are 10 buffer pages available to process the query and there are no indexes available. Assume also that the DBMS only has available the following join methods: nested-loop, block nested loop and sort-merge. Determine the number of page I/Os required by each method to work out which is the cheapest.

[Ramakrishnan, exercise 12.4] Consider the join Join[R.a=S.b](R,S) and the following information about the relations to be joined:
- Relation R contains 10,000 tuples and has 10 tuples per page
- Relation S contains 2,000 tuples and also has 10 tuples per page
- Attribute b of relation S is the primary key for S
- Both of the relations are stored as simple heap files
- Neither relation has any indexes built on it
- There are 52 buffer pages available
Unless otherwise noted, compute all costs as number of page I/Os, except that the cost of writing out the result should be uniformly ignored (it's the same for all methods).
1. What is the cost of joining R and S using page-oriented simple nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?
2. What is the cost of joining R and S using block nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?
3. What is the cost of joining R and S using sort-merge join? What is the minimum number of buffer pages required for this cost to remain unchanged?
4. What is the cost of joining R and S using grace hash join? What is the minimum number of buffer pages required for this cost to remain unchanged?
5. What would be the lowest possible I/O cost for joining R and S using any join algorithm? How much buffer space would be needed to achieve this cost?
6. What is the maximum number of tuples that the join of R and S could produce, and how many pages would be required to store this result?
7. How would your answers to the above questions change if you are told that R.a is a foreign key that refers to S.b?

[Ramakrishnan, exercise 12.5] Consider the join of R and S described in the previous question:
1. With 52 buffer pages, if unclustered B+ tree indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (e.g. using index nested loop join) than a block nested loops join? Explain.
  1. Would your answer change if only 5 buffer pages were available?
  2. Would your answer change if S contained only 10 tuples instead of 2,000 tuples?
2. With 52 buffer pages, if clustered B+ tree indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (e.g. using index nested loop join) than a block nested loops join? Explain.
  1. Would your answer change if only 5 buffer pages were available?
  2. Would your answer change if S contained only 10 tuples instead of 2,000 tuples?
3. If only 15 buffers were available, what would be the cost of sort-merge join? What would be the cost of hash join?
4. If the size of S were increased to also be 10,000 tuples, but only 15 buffer pages were available, what would be the cost for sort-merge join? What would be the cost of hash join?
5. If the size of S were increased to also be 10,000 tuples, and 52 buffer pages were available, what would be the cost for sort-merge join? What would be the cost of hash join?

Consider performing the join:
```
select * from R, S where R.x = S.y
```
where we have b_R = 1000, b_S = 5000 and where R is used as the outer relation.

Compute the page I/O cost of performing this join using hybrid hash join:
1. if we have N=100 memory buffers available
2. if we have N=512 memory buffers available
3. if we have N=1024 memory buffers available