❖ Things To Note |
pg_size_pretty
❖ Implementing Select Efficiently |
Two basic approaches:
Our analyses assume: 1 input buffer available for each relation.
If more buffers are available, most methods benefit.
❖ Heap Files |
❖ Selection in Heaps |
For all selection queries, the only possible strategy is:
// select * from R where C
for each page P in file of relation R {
for each tuple t in page P {
if (t satisfies C)
add tuple t to result set
}
}
i.e. linear scan through file searching for matching tuples
❖ Selection in Heaps (cont) |
The heap is scanned from the first to the last page:
Costrange = Costpmr = b
If we know that only one tuple matches the query (one query),
a simple optimisation is to stop the scan once that tuple is found.
Costone : Best = 1 Average = b/2 Worst = b
❖ Insertion in Heaps |
Insertion: new tuple is appended to file (in last page).
rel = openRelation("R", READ|WRITE); pid = nPages(rel)-1; get_page(rel, pid, buf); if (size(newTup) > size(buf)) { deal with oversize tuple } else { if (!hasSpace(buf,newTup)) { pid++; nPages(rel)++; clear(buf); } insert_record(buf,newTup); put_page(rel, pid, buf); }
Costinsert = 1r + 1w
Plus possible extra writes for oversize tuples, e.g. PostgreSQL's TOAST
❖ Insertion in Heaps (cont) |
Alternative strategy:
R
R
backend/access/heap/{heapam.c,hio.c}
❖ Insertion in Heaps (cont) |
PostgreSQL's tuple insertion:
(Relation relation, // relation desc HeapTuple newtup, // new tuple data CommandId cid, ...) // SQL statement heap_insert
newtup
xmin
❖ Deletion in Heaps |
SQL: delete from
where
Implementation of deletion:
rel = openRelation("R",READ|WRITE); for (p = 0; p < nPages(rel); p++) { get_page(rel, p, buf); ndels = 0; for (i = 0; i < nTuples(buf); i++) { tup = get_record(buf,i); if (tup satisfies Condition) { ndels++; delete_record(buf,i); } } if (ndels > 0) put_page(rel, p, buf); if (ndels > 0 && unique) break; }
❖ Exercise: Cost of Deletion in Heaps |
Consider the following queries ...
delete from Employees where id = 12345 -- one delete from Employees where dept = 'Marketing' -- pmr delete from Employees where 40 <= age and age < 50 -- range
Show how each will be executed and estimate the cost, assuming:
Generalise the cost models for each query type.
❖ Deletion in PostgreSQL |
PostgreSQL tuple deletion:
(Relation relation, // relation desc ItemPointer tid, ..., // tupleID CommandId cid, ...) // SQL statement heap_delete
xmax
❖ Updates in Heaps |
SQL: update
set
=
where
Analysis for updates is similar to that for deletion
Complication: new tuple larger than old version (too big for page)
Solution: delete, re-organise free space, then insert
❖ Updates in Heaps (cont) |
PostgreSQL tuple update:
(Relation relation, // relation desc ItemPointer otid, // old tupleID HeapTuple newtup, ..., // new tuple data CommandId cid, ...) // SQL statement heap_update
delete(otid)
insert(newtup)
ctid
❖ Heaps in PostgreSQL |
PostgreSQL stores all table data in heap files (by default).
Typically there are also associated index files.
If a file is more useful in some other form:
create index...using hash
❖ Heaps in PostgreSQL (cont) |
PostgreSQL "heap file" may use multiple physical files
OID
OID.1
OID_fsm
OID_vm
❖ Sorted Files |
Records stored in file in order of some field k (the sort key).
Makes searching more efficient; makes insertion less efficient
E.g. assume c = 4
❖ Sorted Files (cont) |
In order to mitigate insertion costs, use overflow blocks.
Total number of overflow blocks = bov.
Average overflow chain length = Ov = bov / b.
Bucket = data page + its overflow page(s)
❖ Selection in Sorted Files |
For one queries on sort key, use binary search.
// select * from R where k = val (sorted on R.k) lo = 0; hi = b-1 while (lo <= hi) { mid = (lo+hi) / 2; // int division with truncation (tup,loVal,hiVal) = searchBucket(f,mid,k,val); if (tup != NULL) return tup; else if (val < loVal) hi = mid - 1; else if (val > hiVal) lo = mid + 1; else return NOT_FOUND; } return NOT_FOUND;
where f
mid,lo,hi
k
val,loVal,hiVal
k
❖ Selection in Sorted Files (cont) |
Search a page and its overflow chain for a key value
searchBucket(f,pid,k,val) { buf = getPage(f,pid); (tup,min,max) = searchPage(buf,k,val,+INF,-INF) if (tup != NULL) return(tup,min,max); ovf = openOvFile(f); ovp = ovflow(buf); while (tup == NULL && ovp != NO_PAGE) { buf = getPage(ovf,ovp); (tup,min,max) = searchPage(buf,k,val,min,max) ovp = ovflow(buf); } return (tup,min,max); }
Assumes each page contains index of next page in Ov chain
getPage(f,pid) = { read_page(f,pid,buf); return buf; }
❖ Selection in Sorted Files (cont) |
Search within a page for key; also find min/max key values
searchPage(buf,k,val,min,max) { res = NULL; for (i = 0; i < nTuples(buf); i++) { tup = getTuple(buf,i); if (tup.k == val) res = tup; if (tup.k < min) min = tup.k; if (tup.k > max) max = tup.k; } return (res,min,max); }
❖ Selection in Sorted Files (cont) |
The above method treats each bucket like a single large page.
Cases:
Costone : Best = 1 Worst = log2 b + bov
Average case cost analysis needs assumptions (e.g. data distribution)
❖ Exercise: Searching in Sorted File |
Consider this sorted file with overflows (b=5, c=4):
Compute the cost for answering each of the following:
select * from R where k = 24
select * from R where k = 3
select * from R where k = 14
select max(k) from R
❖ Exercise: Optimising Sorted-file Search |
The searchBucket(f,pid,k,val)
searchBucket()
❖ Search with pmr and range Queries |
For pmr query, on non-unique attribute k, where file is sorted on k
select * from R where k = 2
Begin by locating a page p containing k=
Scan backwards and forwards from p to find matches.
Thus, Costpmr = Costone + (bq-1).(1+Ov)
❖ Search with pmr and range Queries (cont) |
For range queries on unique sort key (e.g. primary key):
select * from R where k >= 5 and k <= 13
Costrange = Costone + (bq-1).(1+Ov)
❖ Search with pmr and range Queries (cont) |
For range queries on non-unique sort key, similar method to pmr.
select * from R where k >= 2 and k <= 6
Costrange = Costone + (bq-1).(1+Ov)
❖ Search with pmr and range Queries (cont) |
So far, have assumed query condition involves sort key k.
But what about select * from R where j = 100.0
If condition contains attribute j, not the sort key
Costone,
Costrange,
Costpmr
as for heap files
❖ Insertion into Sorted Files |
Insertion approach:
Consider insertions of k=33, k=25, k=99 into:
❖ Deletion from Sorted Files |
E.g. delete from R where k = 2
Deletion strategy:
Recall: selectivity determines bq (# pages with matches)
Thus, Costdelete = Costselect + bqw
❖ Hashing |
Basic idea: use key value to compute page address of tuple.
e.g. tuple with key = v is stored in page i
Requires: hash function h(v) that maps KeyDomain → [0..b-1].
❖ Hashing (cont) |
PostgreSQL hash function (simplified):
Datum hash_any(unsigned char *k, register int keylen) { register uint32 a, b, c, len; /* Set up the internal state */ len = keylen; a = b = c = 0x9e3779b9 + len + 3923095; /* handle most of the key */ while (len >= 12) { a += ka[0]; b += ka[1]; c += ka[2]; mix(a, b, c); ka += 3; len -= 12; } /* collect any data from last 11 bytes into a,b,c */ mix(a, b, c); return UInt32GetDatum(c); }
See backend/access/hash/hashfunc.c
mix()
❖ Hashing (cont) |
hash_any()
uint32
Two ways to map raw hash value into a page address:
uint32 hashToPageNum(uint32 hval) { uint32 mask = 0xFFFFFFFF; return (hval & (mask >> (32-k))); }
uint32 hashToPageNum(uint32 hval) { return (hval % b); }
❖ Hashing Performance |
Aims:
Best case: every bucket contains same number of tuples.
Worst case: every tuple hashes to same bucket.
Average case: some buckets have more tuples than others.
Use overflow pages to handle "overfull" buckets (cf. sorted files)
All tuples in each bucket must have same hash value.
❖ Hashing Performance (cont) |
Two important measures for hash files:
Case | L | Ov |
Best | ≅ 1 | 0 |
Worst | ≫ 1 | ** |
Average | < 1 | 0<Ov<1 |
(** performance is same as Heap File)
To achieve average case, aim for 0.75 ≤ L ≤ 0.9.
❖ Selection with Hashing |
Select via hashing on unique key k (one)
// select * from R where k = val
(pid,P) = getPageViaHash(val,R)
for each tuple t in page P {
if (t.k == val) return t
}
for each overflow page Q of P {
for each tuple t in page Q {
if (t.k == val) return t
} }
Costone : Best = 1, Avg = 1+Ov/2 Worst = 1+max(OvLen)
❖ Selection with Hashing (cont) |
Select via hashing on non-unique hash key nk (pmr)
// select * from R where nk = val
(pid,P) = getPageViaHash(val,R)
for each tuple t in page P {
if (t.nk == val) add t to results
}
for each overflow page Q of P {
for each tuple t in page Q {
if (t.nk == val) add t to results
} }
return results
Costpmr = 1 + Ov
❖ Selection with Hashing (cont) |
Hashing does not help with range queries** ...
Costrange = b + bov
Selection on attribute j which is not hash key ...
Costone, Costrange, Costpmr = b + bov
** unless the hash function is order-preserving (and most aren't)
❖ Insertion with Hashing |
Insertion uses similar process to one queries.
// insert tuple t with key=val into rel R
(pid,P) = getPageViaHash(val,R)
if room in page P {
insert t into P; return
}
for each overflow page Q of P {
if room in page Q {
insert t into Q; return
} }
add new overflow page Q
link Q to previous page
insert t into Q
Costinsert : Best: 1r + 1w Worst: 1+max(OvLen))r + 2w
❖ Exercise: Insertion into Static Hashed File |
Consider a file with b=4, c=3, d=2, h(x) = bits(d,hash(x))
Insert tuples in alpha order with the following keys and hashes:
k | hash(k) | k | hash(k) | k | hash(k) | k | hash(k) | |||
a | 10001 |
g | 00000 |
m | 11001 |
s | 01110 |
|||
b | 11010 |
h | 00000 |
n | 01000 |
t | 10011 |
|||
c | 01111 |
i | 10010 |
o | 00110 |
u | 00010 |
|||
d | 01111 |
j | 10110 |
p | 11101 |
v | 11111 |
|||
e | 01100 |
k | 00101 |
q | 00010 |
w | 10000 |
|||
f | 00010 |
l | 00101 |
r | 00000 |
x | 00111 |
The hash values are the 5 lower-order bits from the full 32-bit hash.
❖ Deletion with Hashing |
Similar performance to select on non-unique key:
// delete from R where k = val // f = data file ... ovf = ovflow file (pid,P) = getPageViaHash(val,R) ndel = delTuples(P,k,val) if (ndel > 0) putPage(f,P,pid) for each overflow page qid,Q of P { ndel = delTuples(Q,k,val) if (ndel > 0) putPage(ovf,Q,qid) }
Extra cost over select is cost of writing back modified blocks.
Method works for both unique and non-unique hash keys.
❖ Problem with Hashing... |
So far, discussion of hashing has assumed a fixed file size (b).
What size file to use?
Methods for hashing with dynamic files:
❖ Problem with Hashing... (cont) |
All flexible hashing methods ...
uint32 hash(unsigned char *val)
Require a function to extract d bits from bit-string:
unit32 bits(int d, uint32 val)
Use result of bits()
❖ Exercise: Bit Manipulation |
uint32
01010110...
char *showBits(uint32 val, char *buf);
Analogous to gets()
uint32
uint32 bits(int d, uint32 val);
If d > 0, gives low-order bits; if d < 0, gives high-order bits
❖ Page Splitting |
Important concept for flexible hashing: splitting
101
0101
1101
0101
101
1101
101
❖ Page Splitting (cont) |
Example of splitting:
Tuples only show key value; assume h(val) = val
❖ Linear Hashing |
File organisation:
Disadvantage: requires overflow pages (don't split on full pages)
❖ Linear Hashing (cont) |
File grows linearly (one block at a time, at regular intervals).
Has "phases" of expansion; over each phase, b doubles.
❖ Selection with Lin.Hashing |
If b=2d, the file behaves exactly like standard hashing.
Use d bits of hash to compute block address.
// select * from R where k = val h = hash(val); pid = bits(d,h); // lower-order bits P = getPage(f, pid) for each tuple t in page P and its overflow pages { if (t.k == val) add t to Result; }
Average Costone = 1+Ov
❖ Selection with Lin.Hashing (cont) |
If b != 2d, treat different parts of the file differently.
Parts A and C are treated as if part of a file of size 2d+1.
Part B is treated as if part of a file of size 2d.
Part D does not yet exist (tuples in B may eventually move into it).
❖ Selection with Lin.Hashing (cont) |
Modified search algorithm:
// select * from R where k = val
h = hash(val);
pid = bits(d,h);
if (pid < sp) { pid = bits(d+1,h); }
P = getPage(f, pid)
for each tuple t in page P
and its overflow blocks {
if (t.k == val) add t to Result;
}
Produced: 4 Mar 2024