Parallelism in DB Operations (cont)
Parallel sorting
- scan in parallel, range-partition during scan
- pipeline into local sort on each processor
- merge sorted partitions in order
Potential problem:
- data skew because of unfortunate choice of partition points
- resolve by initial data sampling to determine partitions
|