Cost of Hash-based Projection
The total cost is the sum of the following:
- scanning original relation
Rel : bR
- writing partitions: bP (bR vs bP ?)
- re-reading partitions: bP
- writing the result relation: bOut
Cost = bR + 2bP + bOut
To ensure that n is larger than the largest partition ...
- use hash functions (h1,h2) with uniform spread
- allocate at least sqrt(bR) buffers
- if insufficient buffers, maybe significant re-reading overhead
|