Query Processing
| Query Processing Overview |
| Query Evaluation | 2/154 |
| ... Query Evaluation | 3/154 |
The most common form of interaction with a DBMS involves:
Overall approach clearly applies to selectdeleteupdate
| ... Query Evaluation | 4/154 |
A query in SQL:
Some DBMSs can save query plans for later re-use
(because the planning step is potentially quite expensive).
| ... Query Evaluation | 5/154 |
Internals of the query evaluation "black-box":
| ... Query Evaluation | 6/154 |
Three phases of query evaluation:
| Intermediate Representations | 7/154 |
SQL query text is not easy to manipulate/transform.
Need a query representation formalism that ...
| ... Intermediate Representations | 8/154 |
In principle, could base RA engine on { σ, π, ∪, -, ×} (completeness).
In practice, having only these operators makes execution inefficient.
The standard set of RA operations in a DBMS includes:
| ... Intermediate Representations | 9/154 |
In practice, DBMSs provide several versions of each RA operation.
For example:
select * from R where id = 100-- hashing select * from S-- Btree index where age > 18 and age < 35 select * from T-- grid file where a = 1 and b = 'a' and c = 1.4
Similarly, π and ⋈ have versions to match specific query types.
| ... Intermediate Representations | 10/154 |
We call these specialised version of RA operations RelOps.
One major task of the query processor:
| ... Intermediate Representations | 11/154 |
Terminology variations ...
Relational algebra expression of SQL query
| Query Translation |
| ... Intermediate Representations | 13/154 |
Query translation: SQL statement text → RA expression
| Query Translation | 14/154 |
Translation step is a mapping
Mapping from SQL to RA may including some optimisations.
Mapping processes: lexer/parser, mapping rules, rewriting rules.
Example:
SQL: select name from Students where id=7654321;-- is translated to RA: Proj[name](Sel[id=7654321]Students)
| Parsing SQL | 15/154 |
Parsing task is similar to that for programming languages.
Language elements:
createselectfromwhereStudentsnameidCoursCode+-=<>ANDORNOTIN'a string'12319.99'01-jan-1970'One difference to parsing PLs:
*.h
| ... Parsing SQL | 16/154 |
PostgreSQL dialect of SQL ...
selectsrc/backend/parsersrc/backend/catalogpg_classpg_typepg_proc
| Catalog and Parsing | 17/154 |
Consider the following simple University schema:
Staff(id, name, position, office, phone, ...) Students(id, name, birthday, degree, avg, ...) Subjects(id, title, prereqs, syllabus, ...) Enrolments(sid, subj, session, mark, ...)
DBMS catalog stores following kinds of information:
StaffStudentsidnameStaffidINTEGERStaffStudentsEnrolments.sidStudents.idStudentSubjectsEnrolments
| ... Catalog and Parsing | 18/154 |
The schema/type information is used for
idStudentsSubjectsid='John'select * from Students where id=12345select * from Students where degree='BSc'
| Query Blocks | 19/154 |
A query block is an SQL query with
SELECTFROMWHEREGROUP-BYHAVING⇒ SQL compilers need to decompose queries into blocks
Interesting kinds of blocks:
| ... Query Blocks | 20/154 |
Consider the following example query:
SELECT s.name FROM Students s WHERE s.avg = (SELECT MAX(avg) FROM Students)
which consists of two blocks:
Block1: SELECT MAX(avg) FROM Students
Block2: SELECT s.name FROM Students s
WHERE s.avg = <<Block1>>>
Query processor arranges execution of each block separately and transfers result of Block1 to Block2.
| Mapping SQL to Relational Algebra | 21/154 |
A naive query compiler might use the following translation scheme:
SELECTFROMWHERESELECT s.name, e.subj FROM Students s, Enrolments e WHERE s.id = e.sid AND e.mark > 50;
is translated to
| ... Mapping SQL to Relational Algebra | 22/154 |
A better translation scheme would be something like:
SELECTWHEREWHERESELECT s.name, e.subj FROM Students s, Enrolments e WHERE s.id = e.sid AND e.mark > 50;
is translated to
| ... Mapping SQL to Relational Algebra | 23/154 |
In fact, many SQL compilers ...
| Mapping Rules | 24/154 |
Mapping from an SQL query to an RA expression requires:
| ... Mapping Rules | 25/154 |
Projection:
SELECTFROM
⇒ Project[f1, f2, ... fn](...)
SQL projection extends RA projection with renaming and assignment:
SELECTASASFROM
⇒ Project[x ← a+b, y ← c](R)
| ... Mapping Rules | 26/154 |
Join: (e.g. on R(a,b,c,d) and S(c,d,e))
SELECTFROMWHERE
SELECTFROMJOINONWHERE
⇒ Join[R.a op S.e](R,S)
SELECTFROMNATURAL JOIN
SELECTFROMJOINUSINGWHERE
⇒ Proj[a,b,e](Join[R.c=S.c ∧ R.d=S.d](R,S))
| ... Mapping Rules | 27/154 |
Selection:
SELECTFROMWHERE
⇒ Select[R.f op val](R)
SELECTFROMWHEREAND
⇒ Select[Cond1,R ∧ Cond2,R](R)
or
⇒ Select[Cond1,R](Select[Cond2,R](R))
or
⇒ Select[Cond2,R](Select[Cond1,R](R))
| ... Mapping Rules | 28/154 |
Aggregation operators (e.g. MAXSUM
e.g. SELECT MAX(age) FROM ...
e.g. SELECT MAX(age) FROM ...
e.g. SELECT MAX(age) FROM ...
| ... Mapping Rules | 29/154 |
Sorting (ORDER BY
DISTINCTGROUP BYHAVING
| ... Mapping Rules | 30/154 |
View definitions produce:
create view OldEmps as select * from Employees where birthdate < '01-01-1960';
yields
OldEmps = Select[birthdate<'01-01-1960'](Employees)
| ... Mapping Rules | 31/154 |
General case:
CREATE VIEWAS
⇒ V = mappingOf(SQLstatement)
Special case: views with attribute renaming:
CREATE VIEWASSELECTFROMWHERE
⇒ V = Proj[a ← h, b ← j, c ← k](Select[C](R))
| ... Mapping Rules | 32/154 |
Views used in queries:
select name from OldEmps;
⇒ Projname(mappingOf(OldEmps
⇒ Projname(Select[birthdate<'01-01-1960'](Employees))
| Mapping Example | 33/154 |
The query
can be expressed in SQL as
select distinct s.code from Course c, Subject s, Enrolment e where c.id = e.course and c.subject = s.id group by s.id having count(*) > 100;
| ... Mapping Example | 34/154 |
In the SQL compiler, the query
select distinct s.code from Course c, Subject s, Enrolment e where c.id = e.course and c.subject = s.id group by s.id having count(*) > 100;
might be translated to the relational algebra expression
Uniq(Project[code](
GroupSelect[groupSize>100](
GroupBy[id] (
Enrolment ⋈ Course ⋈ Subjects
))))
| ... Mapping Example | 35/154 |
The join operations could be done in two different ways:
The query optimiser determines which has lower cost.
Note: for a join involving n tables, there are O(n!) possible trees.
| Expression Rewriting Rules | 36/154 |
Since RA is a well-defined formal system
| Relational Algebra Laws | 37/154 |
Commutative and Associative Laws:
Cannot rewrite as R Join[R.b>S.b] (S Join[a<d] T) because neither S.a nor T.a exists.
| ... Relational Algebra Laws | 38/154 |
Selection commutativity (where c is a condition):
| ... Relational Algebra Laws | 39/154 |
Selection pushing with join, cross-product and intersection ...
If c refers only to attributes from R:
| ... Relational Algebra Laws | 40/154 |
Rewrite rules for projection ...
All but last projection can be ignored:
Projections can be pushed into joins:
| Mapping Subqueries | 41/154 |
Two varieties of sub-query: (sample schema R(a,b), S(c,d))
select * from R where a in (select c from S where d>5) select * from R where a = (select max(c) from S)
select * from R where a in (select c from S where d=R.b)
| ... Mapping Subqueries | 42/154 |
Example of mapping independent subquery ...
select * from R where a in (select c from S where d>5)
is mapped as
⇒ Sel[a in Proj[c](Sel[d>5]S)](R)
⇒ Sel[a=c](R × Proj[c](Sel[d>5](S)))
⇒ R Join[a=c] Proj[c](Sel[d>5](S))
| Query Optimisation |
| Query Optimisation | 44/154 |
Query optimiser: RA expression → efficient evaluation plan
| ... Query Optimisation | 45/154 |
Query optimisation is a critical step in query evaluation.
The query optimiser
"Optimisation" is necessary because there can be enormous differences in costs between different plans for evaluating a given RA expression.
E.g. tmp ← A × B; res ← σx(tmp) vs res ← A ⋈x B
| Query Evaluation Example | 46/154 |
Example database schema (multi-campus university):
Employee(eid, ename, status, city) Department(dname, city, address) Subject(sid, sname, syllabus) Lecture(subj, dept, empl, time)
Example database instance statistics:
| r | R | C | b | |||||
Employee |
1000 | 100 | 10 | 100 | ||||
Department |
100 | 200 | 5 | 20 | ||||
Subject |
500 | 95 | 10 | 100 | ||||
Lecture |
2000 | 100 | 10 | 200 |
| ... Query Evaluation Example | 47/154 |
Query: Which depts in Sydney offer database subjects?
select dname
from Department D, Subject S, Lecture L
where D.city = 'Sydney' and S.sname like '%Database%'
and D.dname = L.dept and S.sid = L.subj
Additional information (needed to determine query costs):
| ... Query Evaluation Example | 48/154 |
Consider total I/O costs for five evaluation strategies.
Assumptions in computing costs:
| ... Query Evaluation Example | 49/154 |
Strategy #1 for answering query:
check = (city='Sydney' & sname='Databases' & dname=dept & sid=subj)
| ... Query Evaluation Example | 50/154 |
Costs involved in using strategy #1:
| Reln | r | R | C | b |
TMP1 |
1000000 | 195 | 5 | 20000 |
TMP2 |
100000000 | 395 | 2 | 50000000 |
TMP3 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+20000) + (20+20*20000+5*106) + (5*106+2) + 2
= 100,440,124
| ... Query Evaluation Example | 51/154 |
Strategy #2 for answering query:
| ... Query Evaluation Example | 52/154 |
Costs involved in using strategy #2:
| Reln | r | R | C | b |
TMP1 |
2000 | 195 | 5 | 400 |
TMP2 |
2000 | 395 | 2 | 1000 |
TMP3 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+400) + (20+20*400+1000) + (1000+2) + 2
= 30,524
| ... Query Evaluation Example | 53/154 |
Strategy #3 for answering query:
| ... Query Evaluation Example | 54/154 |
Costs involved in using strategy #3:
| Reln | r | R | C | b |
TMP1 |
2000 | 195 | 5 | 400 |
TMP2 |
20 | 195 | 5 | 4 |
TMP3 |
20 | 395 | 2 | 10 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4 + CostStep5
= (100+100*200+400) + (400+4) + (4+4*20+10) + (10+2) + 1
= 21,011
| ... Query Evaluation Example | 55/154 |
Strategy #4 for answering query:
| ... Query Evaluation Example | 56/154 |
Costs involved in using strategy #4:
| Reln | r | R | C | b |
TMP1 |
80 | 95 | 5 | 16 |
TMP2 |
5 | 200 | 5 | 1 |
TMP3 |
300 | 195 | 5 | 60 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
| ... Query Evaluation Example | 57/154 |
Strategy #5 for answering query:
| ... Query Evaluation Example | 58/154 |
Costs involved in using strategy #5:
| Reln | r | R | C | b |
TMP1 |
80 | 95 | 5 | 16 |
TMP2 |
5 | 200 | 5 | 1 |
TMP3 |
100 | 295 | 3 | 34 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
| Query Optimisation Problem | 59/154 |
Given:
| ... Query Optimisation Problem | 60/154 |
Why do we not generate optimal query execution plans?
Finding an optimal query plan ...
| Cost Models and Analysis | 61/154 |
The cost of evaluating a query is determined by:
| Approaches to Optimisation | 62/154 |
Three main classes of techniques developed:
Real query optimisers use a combination of algrebraic+physical.
Semantic QO is good idea, but expensive/difficult to implement.
| Optimisation Process | 63/154 |
Start with RA tree representation of query, then ...
| ... Optimisation Process | 64/154 |
Example of optimisation transformations:
For join, may also consider sort/merge join and hash join.
| Algebraic Optimisation | 65/154 |
Make use of algebraic equivalences:
Rationale: minimises size of intermediate relations.
Can potentially be done during the SQL→RA mapping phase.
Algebraic optimisation cannot assist with finding good join order.
| Physical Optimisation | 66/154 |
Makes use of execution cost analysis:
| Semantic Optimisation | 67/154 |
Make use of application-specific properties:
Basis: exploit meta-data and other semantic info about relations.
(E.g. this field is a primary key, so we know there'll only be one matching tuple)
| Stages in Algebraic Optimisation | 68/154 |
Start with relational algebra (RA) expression.
| Standardisation | 69/154 |
Many query optimisers assume RA expression is in a "standard form".
E.g. select conditions may be stated in
(A1 AND ... AND An) OR ... OR (Z1 AND ... AND Zm)(A1 OR ... OR An) AND ... AND (Z1 OR ... OR Zm)
| Simplification | 70/154 |
Main aim of simplification is to reduce redundancy.
Makes use of tranformation rules based on laws of boolean algebra.
A AND B |
↔ | B AND A |
A OR (B AND C) |
↔ | (A OR B) AND (A OR C) |
A AND A |
↔ | A |
A OR NOT(A) |
↔ | True |
A AND NOT(A) |
↔ | False |
A OR (A AND B) |
↔ | A |
NOT(A AND B) |
↔ | NOT(A) OR NOT(B) |
NOT(NOT(A)) |
↔ | A |
(Programmers don't produce redundant expressions; but much SQL is produced by programs)
| Simplification Example | 71/154 |
Consider the query:
select Title from Employee
where (not (Title = 'Programmer')
and (Title = 'Programmer'
or Title = 'Analyst')
and not (Title = 'Analyst'))
or Name = 'Joe'
Denote the atomic conditions as follows:
(Title = 'Programmer')(Title = 'Analyst')(Name = 'Joe')
| ... Simplification Example | 72/154 |
Selection condition can then be simplified via:
| Cond | = | ( |
| = | (( |
|
| = | ( |
|
| = | False ∨ J | |
| = | J |
Giving the equivalent simplified query:
select Title from Employee where Name = "Joe"
| Amelioration | 73/154 |
Aim of amelioration is to improve efficiency.
Transform query to equivalent form which is known to be more efficient.
Achieve this via:
| Amelioration Process | 74/154 |
| Amelioration Example | 75/154 |
Consider school information database:
CSG(Course, StudentId, grade) SNAP(StudentId, name, address, phone) CDH(Course, Day, Hour) CR(Course, Room)
And the query:
| ... Amelioration Example | 76/154 |
An obvious translation of the query into SQL
select Room from CSG,SNAP,CDH,CR where name='Brown' and day='Mon' and hour='9am'
and an obvious mapping of the SQL into relational algebra gives
RoomCSGSNAPCDHCR
name='Brown'Day='Mon'Hour='9am'
| ... Amelioration Example | 77/154 |
RoomCSGSNAPCDHCRPush selection:
RoomCSGSNAPCDHCRSplit selection:
RoomCSGSNAPCDHCRPush two selections:
RoomCSGSNAPCDHCRPush selection:
RoomCSGSNAPCDHCRCan we show that the final version is more efficient?
| ... Amelioration Example | 78/154 |
Could use an argument based on size of intermediate results, as above.
But run into problems with expressions like:
CSGSNAPCDHCROrder of joins can make a significant difference?
How to decide "best" order?
Need understanding of physical characteristics of relations
(for example, selectivity and likelihood of join matches across tables)
| Physical (Cost-Based) Optimisation | 79/154 |
Need to determine:
Do this for all possible execution plans.
Choose cheapest plan.
| Execution Plans | 80/154 |
Consider query execution plans for the RA expression:
Plan #1
tmp1 := HashJoin[d](R,S) tmp2 := SortMergeJoin[e](tmp1,T) result := BinarySearch[c](tmp2)
| ... Execution Plans | 81/154 |
Plan #2
tmp1 := SortMergeJoin[e](S,T) tmp2 := HashJoin[d](R,tmp1) result := LinearSearch[c](tmp2)
Plan #3
tmp1 := BtreeSearch[c](R) tmp2 := HashJoin[d](tmp1,S) result := SortMergeJoin[e](tmp2)
All plans produce same result, but have quite different costs.
| Cost-based Query Optimiser | 82/154 |
Overview of cost-based query optimisation process:
| ... Cost-based Query Optimiser | 83/154 |
Approximate algorithm for cost-based optimisation:
translate SQL query to RAexp
for all transformations RA' of RAexp {
for each node e of RA' (recursively) {
select access method for e
plan[i++] = access method for e
}
cost = 0
for each op in plan[]
{ cost += Cost(op) }
if (cost < MinCost) {
MinCost = cost
BestPlan = plan
} }
As noted above, we do not consider *all* transformations.
Heuristics: push selections down, consider only left-deep join trees.
| Choosing Access Methods | 84/154 |
Inputs:
| ... Choosing Access Methods | 85/154 |
Example:
Studentnametmp[i] := BtreeSearch[name='John'](Student) tmp[i+1] := LinearSearch[age>21](tmp[i])
Where possible, use pipelining to avoid storing tmp[i]
| ... Choosing Access Methods | 86/154 |
Rules for choosing σ access methods:
RAindexSearch[A=c](R)RAhashSearch[A=c](R)RAbinarySearch[A=c](R)RAindexSearch[A=c+1](R)RAlinearSearch[A>=c](R)
| ... Choosing Access Methods | 87/154 |
Rules for choosing ⋈ access methods:
RbnlJoin(R,S)RSsmJoin(R,S)RinlJoin(S,R)hashJoin(R,S)bnlinlsm
| Example Plan Enumeration | 88/154 |
Consider the query:
select name from Student s, Enrol e where s.sid = e.sid AND e.cid = 'COMP9315';
where
| Relation | r | b | PrimKey | Clustered | Indexes |
Student |
10000 | 500 | sid |
No | B-tree on sid |
Enrol |
40000 | 400 | cid |
Yes | B-tree on cid |
SE |
40000 | 800 | sid,cid |
No | - |
| ... Example Plan Enumeration | 89/154 |
First step is to map SQL to RA expression
StudentEnrolthen devise an execution plan based on this expression
Student.sidEnrol.cidtmp1 := inlJoin[s.sid=e.sid](Enrol,Student) result := LinearSearch[e.id='COMP9315'](tmp1)
| ... Example Plan Enumeration | 90/154 |
Estimated cost for this plan:
| Cost | = | inl-join on (Enrol,Studenttmp1 |
| = | Tr(bS + rS.btreeE) + Twbtmp1 + Trbtmp1 | |
| = | 500 + 10,000.4 + 800 + 800 | |
| = | 42,100 |
Notes:
tmp1Enrol
| ... Example Plan Enumeration | 91/154 |
Apply rule for pushing select into join:
and devise a plan based on
Enrol.cidtmp1 := BtreeSearch[e.cid='COMP9315'](Enrol) result := bnlJoin[s.sid=e.sid](tmp1,Student)
| ... Example Plan Enumeration | 92/154 |
Estimated cost for this plan:
| Cost | = | btree-search on Enroltmp1,Student |
| = | search+scan on Enrol |
|
| = | 3Tr + 3Tr + Tr(1 + 500) | |
| = | 3 + 3 + 1 + 500 = 507 |
Notes:
Enrol
| Cost Estimation | 93/154 |
Without actually executing it, cannot always know the cost of plan precisely.
(E.g. how big is the select result that feeds into the join?)
Thus, query optimisers need to estimate costs.
Two aspects to cost estimation:
| Cost Estimation Information | 94/154 |
Cost estimation uses statistical information about relations:
| rS | cardinality of relation S |
| RS | avg size of tuple in relation S |
| V(A,S) | # distinct values of attribute A |
| min(A,S) | min value of attribute A |
| max(A,S) | max value of attribute A |
| Estimating Projection Result Size | 95/154 |
Straightforward, since we know:
If pipelining through buffers of size B, then
#buffers-full = rS × ⌈B/Rout⌉
| Estimating Selection Result Size | 96/154 |
Selectivity factor = fraction of tuples expected to satisfy a condition.
Common assumption: attribute values uniformly distributed.
Example: Consider the query
select * from Parts where colour='Red'
and assume that there are only four possible colours.
If we have 1000 PartsRed
In other words,
In general, |σA=c(R)| ≅ rR/V(A,R)
| ... Estimating Selection Result Size | 97/154 |
Estimating size of result for e.g.
select * from Enrolment where year > 2003;
Could estimate by using:
Simpler heuristic used by some systems: selected tuples = r/3
| ... Estimating Selection Result Size | 98/154 |
Estimating size of result for e.g.
select * from Enrolment where course <> 'COMP9315';
Could estimate by using:
| ... Estimating Selection Result Size | 99/154 |
Uniform distribution assumption is convenient, but often not realistic.
How to handle non-uniform attribute value distributions?
WhiteRedBlueSilverUse histogram as basis for determining # selected tuples.
Disadvantage: cost of storing/maintaining histograms.
| Estimating Join Result Size | 100/154 |
Analysis is not as simple as select.
Relies on semantic knowledge about data/relations.
Consider equijoin on common attr: R ⋈A S
Case 1: |R ⋈A S| = 0
Useful if we know that dom(A,R) ∩ dom(A,S) = {}
Case 2: A is unique key in both RS
⇒ can't be more tuples in join than in smaller
of RS
| ... Estimating Join Result Size | 101/154 |
Case 3: A is primary key in RS
⇒ every SR
Example:
select name from Students s, Enrol e where e.cid = 'COMP9315' AND e.sid = s.sid
| Cost Estimation: Postscript | 102/154 |
Inaccurate cost estimation can lead to poor evaluation plans.
Above methods can (sometimes) give inaccurate estimates.
To get more accurate estimates costs:
Trade-off between optimiser performance and query performance.
| Semantic Query Optimisation (SQO) | 103/154 |
Makes use of semantics of data to assist query optimisation process.
The discussion of join cost estimation above gives an example of this.
Kinds of information used:
Two examples of semantic transformation operations:
| Semantic Equivalence | 104/154 |
Semantic equivalence does not require syntactic equivalence.
Consider the two queries:
select * from Emp where sal > 80K select * from Emp where sal > 80K and job = 'Prof'
They are equivalent under the semantic rule:
Emp.job'Prof'Emp.sal > 80K(i.e. Profs are the only people earning more than $80,000)
Could use this to transform query to exploit index on salary
| Restriction Elimination | 105/154 |
Query:
select dept from Storage where material = 'Benzene' and qty > 400
Use rule: material'Benzene'qty > 500
select dept from Storage where material = 'Benzene'
Result: Unnecessary restriction on the attribute qty
| Index Introduction | 106/154 |
Query: Find all the employees who make more than 42K
select name from Employees where salary > 42K
Use rule: salary > 42Kjob'manager'
select name from Employees where salary > 42K and job = 'manager'
Result: A new constraint is obtained on the
indexed attribute job
| Query Execution |
| Query Execution | 108/154 |
Query execution: applies evaluation plan → set of result tuples
| ... Query Execution | 109/154 |
Query execution
| ... Query Execution | 110/154 |
Example of query translation:
select s.name, s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2';
maps to
maps to
Temp1 = BtreeSelect[semester=05s2](Enr) Temp2 = HashJoin[e.student=s.id](Stu,Temp1) Result = Project[name,id,course,mark](Temp2)
| ... Query Execution | 111/154 |
A query execution plan:
| ... Query Execution | 112/154 |
Two ways of communicating results between query blocks ...
Materialization
| Materialization | 113/154 |
Steps in materialization between two operators
| ... Materialization | 114/154 |
Example:
select s.name, s.id, e.course, e.mark
from Student s, Enrolment e
where e.student = s.id and
e.semester = '05s2' and s.name = 'John';
might be executed as
Temp1 = BtreeSelect[semester=05s2](Enrolment)
Temp2 = BtreeSelect[name=John](Student)
-- indexes on name and semester
-- produce sorted Temp1 and Temp2
Temp3 = SortMergeJoin[e.student=s.id](Temp1,Temp2)
-- SMJoin especially effective, since
-- Temp1 and Temp2 are already sorted
Result = Project[name,id,course,mark](Temp3)
| Pipelining | 115/154 |
How pipelining is organised between two operators:
| Iterators (reminder) | 116/154 |
Iterators provide a "stream" of results:
iter = open()tuple = next(iter)close(iter)
| ... Iterators (reminder) | 117/154 |
Implementation of single-relation selection iterator:
typedef struct {
File inf; // input file
Cond cond; // selection condition
Buffer buf; // buffer holding current page
int curp; // current page during scan
int curr; // index of current record in page
} Iterator;
Iterator structure contains information:
condcurp, curr
| ... Iterators (reminder) | 118/154 |
Implementation of single-relation selection iterator (cont):
Iterator *open(char *relName, Condition cond) {
Iterator *iter = malloc(sizeof(Iterator));
iter->inf = openFile(fileName(relName),READ);
iter->cond = cond;
iter->curp = 0;
iter->curr = -1;
readBlock(iter->inf, iter->curp, iter->buf);
return iter;
}
void close(Iterator *iter) {
closeFile(iter->inf);
free(iter);
}
| ... Iterators (reminder) | 119/154 |
Implementation of single-relation selection iterator (cont):
Tuple next(Iterator *iter) {
Tuple rec;
do {
// check if reached end of current page
if (iter->curr == nRecs(iter->buf)-1) {
// check if reached end of data file
if (iter->curp == nBlocks(iter->inf)-1)
return NULL;
iter->curp++;
iter->buf = readBlock(iter->inf, iter->curp);
iter->curr = -1;
}
iter->curr++;
rec = getRec(iter->buf, iter->curr);
} while (!matches(rec, iter->cond));
return rec;
}
// curp and curr hold indexes of most recently read page/record
| Pipelining Example | 120/154 |
Consider the query:
select s.id, e.course, e.mark
from Student s, Enrolment e
where e.student = s.id and
e.semester = '05s2' and s.name = 'John';
which maps to the RA expression
which could represented by the RA expression tree
| ... Pipelining Example | 121/154 |
Modelled as communication between RA tree nodes:
| ... Pipelining Example | 122/154 |
This query might be executed as
System:
iter0 = open(Result)
while (Tup = next(iter0)) { display Tup }
close(iter0)
Result:
iter1 = open(Join)
while (T = next(iter1))
{ T' = project(T); return T' }
close(iter1)
Sel1:
iter4 = open(Btree(Enrolment,'semester=05s2'))
while (A = next(iter4)) { return A }
close(iter4)
...
| ... Pipelining Example | 123/154 |
... Join:-- nested-loop join iter2 = open(Sel1) while (R = next(iter2) { iter3 = open(Sel2) while (S = next(iter3)) { if (matches(R,S) return (RS) } close(iter3)// better to reset(iter3) } close(iter2) Sel2: iter5 = open(Btree(Student,'name=John')) while (B = next(iter5)) { return B } close(iter5)
| Pipeline Execution | 124/154 |
Piplines can be executed as ...
Demand-driven
In parallel-processing systems, iterators could run concurrently.
| Disk Accesses | 125/154 |
Pipelining cannot avoid all disk accesses.
Some operations use multiple passes (e.g. merge-sort, hash-join).
| ... Disk Accesses | 126/154 |
Sophisticated query optimisers might realise e.g.
In this case, it could materialize X's output in an S-file.
Produces a pipeline/materialization hybrid query execution.
Example:
| ... Disk Accesses | 127/154 |
Example: (pipeline/materialization hybrid)
select s.id, e.course, e.mark
from Student s, Enrolment e
where e.student = s.id and
e.semester = '05s2' and s.name = 'John';
might be executed as
System:
exec(Sel2) -- creates Temp1
iter0 = open(Result)
while (Tup = next(iter0)) { display Tup }
close(iter0)
Result:
iter1 = open(Join)
while (T = next(iter1))
{ T' = project(T); return T' }
close(iter1)
...
| ... Disk Accesses | 128/154 |
... Join:-- index join iter2 = open(Sel1) while (R = next(iter2) { iter3 = open(Btree(Temp1,'id=R.student')) while (S = next(iter3)) { return (RS) } close(iter3) } close(iter2) Sel1: iter4 = open(Btree(Enrolment,'semester=05s2')) while (A = next(iter4)) { return A } close(iter4) Sel2: iter5 = open(Btree(Student,'name=John')) createBtree(Temp1,'id') while (B = next(iter5)) { insert(B,Temp1) } close(iter5)
| PostgreSQL Execution | 129/154 |
Defs: src/include/executorsrc/include/nodes
Code: src/backend/executor
PostgreSQL uses pipelining ...
PlanScanGroupIndexscanSortHashJoinPlanState
| ... PostgreSQL Execution | 130/154 |
Modules in src/backend/executor
execXXX
nodeXXXExecInitXXXExecXXXExecEndXXX
| ... PostgreSQL Execution | 131/154 |
Top-level data/functions for executor ...
QueryDesc
ExecutorStart(QueryDesc *, ...)ExecutorRun(QueryDesc *, ScanDirection, ...)ExecutorEnd(QueryDesc *)
| ... PostgreSQL Execution | 132/154 |
Overview of query processing:
CreateQueryDesc
ExecutorStart
CreateExecutorState -- creates per-query context
switch to per-query context to run ExecInitNode
ExecInitNode -- recursively scans plan tree
CreateExprContext -- creates per-tuple context
ExecInitExpr
ExecutorRun
ExecutePlan -- invoke iterators from root
ExecProcNode -- recursively called in per-query context
ExecEvalExpr -- called in per-tuple context
ResetExprContext -- to free memory
ExecutorEnd
ExecEndNode -- recursively releases resources
FreeExecutorState -- frees per-query and child contexts
FreeQueryDesc
| ... PostgreSQL Execution | 133/154 |
More detailed view of plan execution (but still much simplified)
ExecutePlan(execState, planStateNode, ...) {
process "before each statement" triggers
for (;;) {
tuple = ExecProcNode(planStateNode)
check tuple validity // MVCC
if (got a tuple) break
}
process "after each statement" triggers
return tuple
}
ExecProcNode(node) {
switch (nodeType(node)) {
case SeqScan:
result = ExecSeqScan(node); break;
case NestLoop:
result = ExecNestLoop(node); break;
...
}
return result;
}
| ... PostgreSQL Execution | 134/154 |
Generic iterator interface is provided by ...
ExecInitNode
ExecProcNodeExecEndNode
| Example PostgreSQL Execution | 135/154 |
Consider the query:
-- get manager's age and # employees in Shoe department select e.age, d.nemps from Departments d, Employees e where e.name = d.manager and d.name ='Shoe'
and its execution plan tree
| ... Example PostgreSQL Execution | 136/154 |
This produces a tree with three nodes:
NestedLoopIndexScanSeqScan
| ... Example PostgreSQL Execution | 137/154 |
Initially InitPlan() invokes ExecInitNode() on plan tree root.
ExecInitNode() sees a NestedLoop node ...
so dispatches to ExecInitNestLoop() to set up iterator
and then invokes ExecInitNode() on left and right sub-plans
in left subPlan, ExecInitNode() sees an IndexScan node
so dispatches to ExecInitIndexScan() to set up iterator
in right sub-plan, ExecInitNode() sees aSeqScan node
so dispatches to ExecInitSeqScan() to set up iterator
Result: a plan state tree with same structure as plan tree.
| ... Example PostgreSQL Execution | 138/154 |
Execution: ExecutePlan() repeatedly invokes ExecProcNode().
ExecProcNode() sees a NestedLoop node ...
so dispatches to ExecNestedLoop() to get next tuple
which invokes ExecProcNode() on its sub-plans
in the left sub-plan, ExecProcNode() sees an IndexScan node
so dispatches to ExecIndexScan() to get next tuple
if no more tuples, return END
for this tuple, invoke ExecProcNode() on right sub-plan
ExecProcNode() sees a SeqScan node
so dispatches to ExecSeqScan() to get next tuple
check for match and return joined tuples if found
reset right sub-plan iterator
Result: stream of result tuples returned via ExecutePlan()
| Performance Tuning |
| Performance Tuning | 140/154 |
Schema design:
| ... Performance Tuning | 141/154 |
Tuning requires us to consider the following:
| ... Performance Tuning | 142/154 |
Performance can be considered at two times:
| Denormalisation | 143/154 |
Normalisation minimises storage redundancy.
| ... Denormalisation | 144/154 |
Example ... consider the following normalised schema:
create table Subject ( id serial primary key, code char(8),-- e.g. COMP9315 title varchar(60), syllabus text, ... ); create table Term ( id serial primary key, name char(4),-- e.g. 09s2 starting date, ... ); create table Course ( subject integer references Subject(id), term integer references Term(id), lic integer references Staff(id), ... );
| ... Denormalisation | 145/154 |
Example: Courses = Course ⋈ Subject ⋈ Term
If we often need to refer to "standard" name (e.g. COMP9315 09s2)
courseNameCourseCourseCourse-- can now replace a query like: select s.code||' '||t.name, avg(e.mark) from Course c, Subject s, Term t where c.subject = s.id and c.term = t.id and s.code='COMP9315' and t.name='09s2'-- by a query like: select c.courseName, e.grade, e.mark from Course c where c.courseName = 'COMP9315 09s2'
| Indexes | 146/154 |
Indexes provide efficient content-based access to tuples.
Can build indexes on any (combination of) attributes.
Definining indexes:
CREATE INDEX name ON table ( attr1, attr2, ... )
attri can be an arbitrary expression (e.g. upper(name)
CREATE INDEX
UNIQUEUSING
| ... Indexes | 147/154 |
Indexes can significantly improve query costs.
Considerations in applying indexes:
-- use hashing for (unique) attributes in equality tests, e.g. select * from Employee where id = 12345-- use B-tree for attributes in range tests, e.g. select * from Employee where age > 60
| Query Tuning | 148/154 |
Sometimes, a query can be re-phrased to affect performance:
select name from Employee where salary/365 > 100
-- fix by re-phrasing condition to (salary > 36500)
select name from Employee where name like '%ith%'
select name from Employee where birthday is null
-- above two are difficult to "fix"
select name from Employee
where dept in (select id from Dept where ...)
-- fix by using Employee join Dept on (e.dept=d.id)
| ... Query Tuning | 149/154 |
Other factors to consider in query tuning:
select distinctdistinctselect ... Employee join Customer on (s.name = p.name)vs select ... Employee join Customer on (s.ssn = p.ssn)
orororselect name from Employee where dept=1 or dept=2vs (select name from Employee where dept=1) union (select name from Employee where dept=2)
| PostgreSQL Query Tuning | 150/154 |
PostgreSQL provides the explain
EXPLAIN [ANALYZE] Query
Without ANALYZEEXPLAIN
With ANALYZEEXPLAIN
Note that runtimes may show considerable variation due to buffering.
| EXPLAIN Examples | 151/154 |
Example: Select on indexed attribute
ass2=# explain select * from Students where id=100250;
QUERY PLAN
-----------------------------------------------------------------
Index Scan using student_pkey on student
(cost=0.00..5.94 rows=1 width=17)
Index Cond: (id = 100250)
ass2=# explain analyze select * from Students where id=100250;
QUERY PLAN
-----------------------------------------------------------------
Index Scan using student_pkey on student
(cost=0.00..5.94 rows=1 width=17)
(actual time=31.209..31.212 rows=1 loops=1)
Index Cond: (id = 100250)
Total runtime: 31.252 ms
| ... EXPLAIN Examples | 152/154 |
Example: Select on non-indexed attribute
ass2=# explain select * from Students where stype='local';
QUERY PLAN
----------------------------------------------------------
Seq Scan on student (cost=0.00..70.33 rows=18 width=17)
Filter: ((stype)::text = 'local'::text)
ass2=# explain analyze select * from Students
ass2-# where stype='local';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on student (cost=0.00..70.33 rows=18 width=17)
(actual time=0.061..4.784 rows=2512 loops=1)
Filter: ((stype)::text = 'local'::text)
Total runtime: 7.554 ms
| ... EXPLAIN Examples | 153/154 |
Example: Join on a primary key (indexed) attribute
ass2=# explain
ass2-# select s.sid,p.name
ass2-# from Students s, People p where s.id=p.id;
QUERY PLAN
-----------------------------------------------------------
Hash Join (cost=70.33..305.86 rows=3626 width=52)
Hash Cond: ("outer".id = "inner".id)
-> Seq Scan on person p
(cost=0.00..153.01 rows=3701 width=52)
-> Hash (cost=61.26..61.26 rows=3626 width=8)
-> Seq Scan on student s
(cost=0.00..61.26 rows=3626 width=8)
| ... EXPLAIN Examples | 154/154 |
Example: Join on a non-indexed attribute
ass3=> explain select s1.code, s2.code
ass2-# from Subjects s1, Subjects s2 where s1.offerer=s2.offerer;
QUERY PLAN
----------------------------------------------------------------
Merge Join (cost=2744.12..18397.14 rows=1100342 width=18)
Merge Cond: (s1.offerer = s2.offerer)
-> Sort (cost=1372.06..1398.33 rows=10509 width=13)
Sort Key: s1.offerer
-> Seq Scan on subjects s1
(cost=0.00..670.09 rows=10509 width=13)
-> Sort (cost=1372.06..1398.33 rows=10509 width=13)
Sort Key: s2.offerer
-> Seq Scan on subjects s2
(cost=0.00..670.09 rows=10509 width=13)
(8 rows)
Produced: 28 Jul 2019