❖ DBMSs |
DBMS = DataBase Management System
Our view of the DBMS so far ...
A machine to process SQL queries.
❖ DBMSs (cont) |
One view of DB engine: "relational algebra virtual machine"
Machine code for such a machine:
projection (π) | join (⋈, ×) | |||
intersection (∩) | difference (-) | |||
insert | delete |
For each of these operations:
❖ Mapping SQL to RA |
Mapping SQL to relational algebra, e.g.
-- schema: R(a,b) S(c,d) select a as x from R join S on (b=c) where d = 100 -- could be mapped to Tmp1(a,b,c,d) = R Join[b=c] S Tmp2(a,b,c,d) = Sel[d=100](Tmp1) Tmp3(a) = Proj[a](Tmp2) Res(x) = Rename[Res(x)](Tmp3)
In general:
SELECT
WHERE
FROM
❖ Mapping Example |
Consider the database schema:
Person(pid, name, address, ...) Subject(sid, code, title, uoc, ...) Terms(tid, code, start, end, ...) Courses(cid, sid, tid, ...) Enrolments(cid, pid, mark, ..)
and the query: Courses with more than 100 students in them?
which can be expressed in SQL as
select s.sid, s.code from Course c join Subject s on (c.sid=s.sid) join Enrolment e on (c.cid=e.cid) group by s.sid, s.code having count(*) > 100;
❖ Mapping Example (cont) |
The SQL might be compiled to
Tmp1(cid,sid,pid) = Course Join[c.cid = e.cid] Enrolment Tmp2(cid,code,pid) = Tmp1 Join[t1.sid = s.sid] Subject Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2) Res(cid,code) = Sel[nstu > 100](Tmp3)
or, equivalently
Tmp1(cid,code) = Course Join[c.sid = s.sid] Subject Tmp2(cid,code,pid) = Tmp1 Join[t1.cid = e.cid] Enrolment Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2) Res(cid,code) = Sel[nstu > 100](Tmp3)
Which is better?
❖ Query Cost Estimation |
The cost of evaluating a query is determined by
❖ Query Cost Estimation (cont) |
Consider execution plans for: σc (R ⋈d S ⋈e T) where R(c,d), S(d,e), T(e)
Tmp1(c,d,e) := hash_join[d](R,S) Tmp2(c,d,e) := sort_merge_join[e](tmp1,T) Res(c,d,e) := binary_search[c](Tmp2)
or
Tmp1(d,e) := sort_merge_join[e](S,T) Tmp2(c,d,e) := hash_join[d](R,Tmp1) Res(c,d,e) := linear_search[c](Tmp2)
or
Tmp1(c,d) := btree_search[c](R) Tmp2(c,d,e) := hash_join[d](Tmp1,S) Res(c,d,e) := sort_merge_join[e](Tmp2,T)
All produce same result, but have different costs.
❖ Implementations of RA Ops |
Sorting (quicksort, etc. are not applicable)
Produced: 29 Jan 2023