Linear Hashing

COMP9315 23T1 ♢ Linear Hashing ♢ [0/13]
❖ Linear Hashing

File organisation:

[Diagram:Pics/file-struct/hash-file1.png]

COMP9315 23T1 ♢ Linear Hashing ♢ [1/13]
❖ Linear Hashing (cont)


Linear Hashing uses a systematic method of growing data file ...

Advantage: does not require auxiliary storage for a directory

Disadvantage: requires overflow pages   (don't split on full pages)

COMP9315 23T1 ♢ Linear Hashing ♢ [2/13]
❖ Linear Hashing (cont)

File grows linearly (one page at a time, at regular intervals).

Has "phases" of expansion; over each phase, b doubles.

[Diagram:Pics/file-struct/linhash1.png]

COMP9315 23T1 ♢ Linear Hashing ♢ [3/13]
❖ Selection with Lin.Hashing

If b=2d, the file behaves exactly like standard hashing.

Use d bits of hash to compute page address.

// select * from R where k = val
h = hash(val);
P = bits(d,h);  // lower-order bits
for each tuple t in page P
         and its overflow pages {
    if (t.k == val) return t;
}

Average Costone  =  1+Ov

COMP9315 23T1 ♢ Linear Hashing ♢ [4/13]
❖ Selection with Lin.Hashing (cont)

If b != 2d, treat different parts of the file differently.

[Diagram:Pics/file-struct/linhash2.png]

Parts A and C are treated as if part of a file of size 2d+1.

Part B is treated as if part of a file of size 2d.

Part D does not yet exist (tuples in B may eventually move into it).

COMP9315 23T1 ♢ Linear Hashing ♢ [5/13]
❖ Selection with Lin.Hashing (cont)

Modified search algorithm:

// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (pid < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
         and its overflow pages {
    if (t.k == val) return R;
}

COMP9315 23T1 ♢ Linear Hashing ♢ [6/13]
❖ File Expansion with Lin.Hashing

[Diagram:Pics/file-struct/linhash3.png]

COMP9315 23T1 ♢ Linear Hashing ♢ [7/13]
❖ Insertion with Lin.Hashing

Abstract view:

pid = bits(d,hash(val));
if (pid < sp) pid = bits(d+1,hash(val));
// bucket P = page P + its overflow pages
P = getPage(f,pid)
for each page Q in bucket P {
    if (space in Q) {
        insert tuple into Q
        break
    }
}
if (no insertion) {
    add new ovflow page to bucket P
    insert tuple into new page
}
if (need to split) {
    partition tuples from bucket sp
          into buckets sp and sp+2^d
    sp++;
    if (sp == 2^d) { d++; sp = 0; }
}

COMP9315 23T1 ♢ Linear Hashing ♢ [8/13]
❖ Splitting

How to decide that we "need to split"?

Two approaches to triggering a split:

Note: always split page sp, even if not full or "current"

Systematic splitting like this ...

COMP9315 23T1 ♢ Linear Hashing ♢ [9/13]
❖ Splitting (cont)

Splitting process for page sp=01:

[Diagram:Pics/file-struct/linhash4.png]

COMP9315 23T1 ♢ Linear Hashing ♢ [10/13]
❖ Splitting (cont)

Splitting algorithm:

// partition tuples between two buckets
newp = sp + 2^d; oldp = sp;
for all tuples t in P[oldp] and its overflows {
    p = bits(d+1,hash(t.k));
    if (p == newp) 
        add tuple t to bucket[newp]
    else
        add tuple t to bucket[oldp]
}
sp++;
if (sp == 2^d) { d++; sp = 0; }

COMP9315 23T1 ♢ Linear Hashing ♢ [11/13]
❖ Insertion Cost

If no split required, cost same as for standard hashing:

Costinsert:    Best: 1r + 1w,    Avg: (1+Ov)r + 1w,    Worst: (1+max(Ov))r + 2w

If split occurs, incur Costinsert  plus cost of splitting:

On average, Costsplit  =  (1+Ov)r + (2+Ov)w
COMP9315 23T1 ♢ Linear Hashing ♢ [12/13]
❖ Deletion with Lin.Hashing

Deletion is similar to ordinary static hash file.

But might wish to contract file when enough tuples removed.

Rationale: r shrinks, b stays large wasted space.

Method:

COMP9315 23T1 ♢ Linear Hashing ♢ [13/13]


Produced: 23 Feb 2023