❖ Linear Hashing |
File organisation:
❖ Linear Hashing (cont) |
Linear Hashing uses a systematic method of growing data file ...
Disadvantage: requires overflow pages (don't split on full pages)
❖ Linear Hashing (cont) |
File grows linearly (one page at a time, at regular intervals).
Has "phases" of expansion; over each phase, b doubles.
❖ Selection with Lin.Hashing |
If b=2d, the file behaves exactly like standard hashing.
Use d bits of hash to compute page address.
// select * from R where k = val h = hash(val); P = bits(d,h); // lower-order bits for each tuple t in page P and its overflow pages { if (t.k == val) return t; }
Average Costone = 1+Ov
❖ Selection with Lin.Hashing (cont) |
If b != 2d, treat different parts of the file differently.
Parts A and C are treated as if part of a file of size 2d+1.
Part B is treated as if part of a file of size 2d.
Part D does not yet exist (tuples in B may eventually move into it).
❖ Selection with Lin.Hashing (cont) |
Modified search algorithm:
// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (pid < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
and its overflow pages {
if (t.k == val) return R;
}
❖ Insertion with Lin.Hashing |
Abstract view:
pid = bits(d,hash(val));
if (pid < sp) pid = bits(d+1,hash(val));
// bucket P = page P + its overflow pages
P = getPage(f,pid)
for each page Q in bucket P {
if (space in Q) {
insert tuple into Q
break
}
}
if (no insertion) {
add new ovflow page to bucket P
insert tuple into new page
}
if (need to split) {
partition tuples from bucket sp
into buckets sp and sp+2^d
sp++;
if (sp == 2^d) { d++; sp = 0; }
}
❖ Splitting |
How to decide that we "need to split"?
Two approaches to triggering a split:
Note: always split page sp, even if not full or "current"
Systematic splitting like this ...
❖ Splitting (cont) |
Splitting algorithm:
// partition tuples between two buckets
newp = sp + 2^d; oldp = sp;
for all tuples t in P[oldp] and its overflows {
p = bits(d+1,hash(t.k));
if (p == newp)
add tuple t to bucket[newp]
else
add tuple t to bucket[oldp]
}
sp++;
if (sp == 2^d) { d++; sp = 0; }
❖ Insertion Cost |
If no split required, cost same as for standard hashing:
Costinsert: Best: 1r + 1w, Avg: (1+Ov)r + 1w, Worst: (1+max(Ov))r + 2w
If split occurs, incur Costinsert plus cost of splitting:
❖ Deletion with Lin.Hashing |
Deletion is similar to ordinary static hash file.
But might wish to contract file when enough tuples removed.
Rationale: r shrinks, b stays large ⇒ wasted space.
Method:
Produced: 23 Feb 2023