[prev] 30 [next]

Cost of Hash-based Projection

The total cost is the sum of the following:
  • scanning original relation Rel:   bR
  • writing partitions:   bP   (bR vs bP ?)
  • re-reading partitions:   bP
  • writing the result relation:   bOut
Cost = bR + 2bP + bOut

To ensure that n is larger than the largest partition ...

  • use hash functions (h1,h2) with uniform spread
  • allocate at least sqrt(bR) buffers
  • if insufficient buffers, maybe significant re-reading overhead