[prev] 84 [next]

Cost of Hash-based Projection

The total cost is the sum of the following:
  • scanning original relation R:   bR
  • writing partitions:   bP   (bR vs bP ?)
  • re-reading partitions:   bP
  • writing the result relation:   bOut
Cost = bR + 2bP + bOut

To ensure that n is larger than the largest partition ...

  • use hash functions (h1,h2) with uniform spread
  • allocate at least sqrt(bR)+1 buffers
  • if insufficient buffers, significant re-reading overhead