Estimating Selection Result Size (cont)
How to handle non-uniform attribute value distributions?
- collect statistics about the values stored in the attribute/relation
- store these as e.g. a histogram in the meta-data for the relation
So, for part colour example, might have distribution like:
White : 35%
Red : 30%
Blue : 25%
Silver : 10%
Use histogram as basis for determining # selected tuples.
Disadvantage: cost of storing/maintaining histograms.
|