Query Processing
Query Processing Overview |
Query Evaluation | 2/154 |
... Query Evaluation | 3/154 |
The most common form of interaction with a DBMS involves:
Overall approach clearly applies to select
delete
update
... Query Evaluation | 4/154 |
A query in SQL:
Some DBMSs can save query plans for later re-use
(because the planning step is potentially quite expensive).
... Query Evaluation | 5/154 |
Internals of the query evaluation "black-box":
... Query Evaluation | 6/154 |
Three phases of query evaluation:
Intermediate Representations | 7/154 |
SQL query text is not easy to manipulate/transform.
Need a query representation formalism that ...
... Intermediate Representations | 8/154 |
In principle, could base RA engine on { σ, π, ∪, -, ×} (completeness).
In practice, having only these operators makes execution inefficient.
The standard set of RA operations in a DBMS includes:
... Intermediate Representations | 9/154 |
In practice, DBMSs provide several versions of each RA operation.
For example:
select * from R where id = 100-- hashing select * from S-- Btree index where age > 18 and age < 35 select * from T-- grid file where a = 1 and b = 'a' and c = 1.4
Similarly, π and ⋈ have versions to match specific query types.
... Intermediate Representations | 10/154 |
We call these specialised version of RA operations RelOps.
One major task of the query processor:
... Intermediate Representations | 11/154 |
Terminology variations ...
Relational algebra expression of SQL query
Query Translation |
... Intermediate Representations | 13/154 |
Query translation: SQL statement text → RA expression
Query Translation | 14/154 |
Translation step is a mapping
Mapping from SQL to RA may including some optimisations.
Mapping processes: lexer/parser, mapping rules, rewriting rules.
Example:
SQL: select name from Students where id=7654321;-- is translated to RA: Proj[name](Sel[id=7654321]Students)
Parsing SQL | 15/154 |
Parsing task is similar to that for programming languages.
Language elements:
create
select
from
where
Students
name
id
CoursCode
+
-
=
<
>
AND
OR
NOT
IN
'a string'
123
19.99
'01-jan-1970'
One difference to parsing PLs:
*.h
... Parsing SQL | 16/154 |
PostgreSQL dialect of SQL ...
select
src/backend/parser
src/backend/catalog
pg_class
pg_type
pg_proc
Catalog and Parsing | 17/154 |
Consider the following simple University schema:
Staff(id, name, position, office, phone, ...) Students(id, name, birthday, degree, avg, ...) Subjects(id, title, prereqs, syllabus, ...) Enrolments(sid, subj, session, mark, ...)
DBMS catalog stores following kinds of information:
Staff
Students
id
name
Staff
id
INTEGER
Staff
Students
Enrolments.sid
Students.id
Student
Subjects
Enrolments
... Catalog and Parsing | 18/154 |
The schema/type information is used for
id
Students
Subjects
id='John'
select * from Students where id=12345
select * from Students where degree='BSc'
Query Blocks | 19/154 |
A query block is an SQL query with
SELECT
FROM
WHERE
GROUP-BY
HAVING
⇒ SQL compilers need to decompose queries into blocks
Interesting kinds of blocks:
... Query Blocks | 20/154 |
Consider the following example query:
SELECT s.name FROM Students s WHERE s.avg = (SELECT MAX(avg) FROM Students)
which consists of two blocks:
Block1: SELECT MAX(avg) FROM Students Block2: SELECT s.name FROM Students s WHERE s.avg = <<Block1>>>
Query processor arranges execution of each block separately and transfers result of Block1 to Block2.
Mapping SQL to Relational Algebra | 21/154 |
A naive query compiler might use the following translation scheme:
SELECT
FROM
WHERE
SELECT s.name, e.subj FROM Students s, Enrolments e WHERE s.id = e.sid AND e.mark > 50;
is translated to
... Mapping SQL to Relational Algebra | 22/154 |
A better translation scheme would be something like:
SELECT
WHERE
WHERE
SELECT s.name, e.subj FROM Students s, Enrolments e WHERE s.id = e.sid AND e.mark > 50;
is translated to
... Mapping SQL to Relational Algebra | 23/154 |
In fact, many SQL compilers ...
Mapping Rules | 24/154 |
Mapping from an SQL query to an RA expression requires:
... Mapping Rules | 25/154 |
Projection:
SELECT
FROM
⇒ Project[f1, f2, ... fn](...)
SQL projection extends RA projection with renaming and assignment:
SELECT
AS
AS
FROM
⇒ Project[x ← a+b, y ← c](R)
... Mapping Rules | 26/154 |
Join: (e.g. on R(a,b,c,d) and S(c,d,e))
SELECT
FROM
WHERE
SELECT
FROM
JOIN
ON
WHERE
⇒ Join[R.a op S.e](R,S)
SELECT
FROM
NATURAL JOIN
SELECT
FROM
JOIN
USING
WHERE
⇒ Proj[a,b,e](Join[R.c=S.c ∧ R.d=S.d](R,S))
... Mapping Rules | 27/154 |
Selection:
SELECT
FROM
WHERE
⇒ Select[R.f op val](R)
SELECT
FROM
WHERE
AND
⇒ Select[Cond1,R ∧ Cond2,R](R)
or
⇒ Select[Cond1,R](Select[Cond2,R](R))
or
⇒ Select[Cond2,R](Select[Cond1,R](R))
... Mapping Rules | 28/154 |
Aggregation operators (e.g. MAX
SUM
e.g. SELECT MAX(age) FROM ...
e.g. SELECT MAX(age) FROM ...
e.g. SELECT MAX(age) FROM ...
... Mapping Rules | 29/154 |
Sorting (ORDER BY
DISTINCT
GROUP BY
HAVING
... Mapping Rules | 30/154 |
View definitions produce:
create view OldEmps as select * from Employees where birthdate < '01-01-1960';
yields
OldEmps = Select[birthdate<'01-01-1960'](Employees)
... Mapping Rules | 31/154 |
General case:
CREATE VIEW
AS
⇒ V = mappingOf(SQLstatement)
Special case: views with attribute renaming:
CREATE VIEW
AS
SELECT
FROM
WHERE
⇒ V = Proj[a ← h, b ← j, c ← k](Select[C](R))
... Mapping Rules | 32/154 |
Views used in queries:
select name from OldEmps;
⇒ Projname(mappingOf(OldEmps
⇒ Projname(Select[birthdate<'01-01-1960'](Employees))
Mapping Example | 33/154 |
The query
can be expressed in SQL as
select distinct s.code from Course c, Subject s, Enrolment e where c.id = e.course and c.subject = s.id group by s.id having count(*) > 100;
... Mapping Example | 34/154 |
In the SQL compiler, the query
select distinct s.code from Course c, Subject s, Enrolment e where c.id = e.course and c.subject = s.id group by s.id having count(*) > 100;
might be translated to the relational algebra expression
Uniq(Project[code]( GroupSelect[groupSize>100]( GroupBy[id] ( Enrolment ⋈ Course ⋈ Subjects ))))
... Mapping Example | 35/154 |
The join operations could be done in two different ways:
The query optimiser determines which has lower cost.
Note: for a join involving n tables, there are O(n!) possible trees.
Expression Rewriting Rules | 36/154 |
Since RA is a well-defined formal system
Relational Algebra Laws | 37/154 |
Commutative and Associative Laws:
Cannot rewrite as R Join[R.b>S.b] (S Join[a<d] T) because neither S.a nor T.a exists.
... Relational Algebra Laws | 38/154 |
Selection commutativity (where c is a condition):
... Relational Algebra Laws | 39/154 |
Selection pushing with join, cross-product and intersection ...
If c refers only to attributes from R:
... Relational Algebra Laws | 40/154 |
Rewrite rules for projection ...
All but last projection can be ignored:
Projections can be pushed into joins:
Mapping Subqueries | 41/154 |
Two varieties of sub-query: (sample schema R(a,b), S(c,d))
select * from R where a in (select c from S where d>5) select * from R where a = (select max(c) from S)
select * from R where a in (select c from S where d=R.b)
... Mapping Subqueries | 42/154 |
Example of mapping independent subquery ...
select * from R where a in (select c from S where d>5)
is mapped as
⇒ Sel[a in Proj[c](Sel[d>5]S)](R)
⇒ Sel[a=c](R × Proj[c](Sel[d>5](S)))
⇒ R Join[a=c] Proj[c](Sel[d>5](S))
Query Optimisation |
Query Optimisation | 44/154 |
Query optimiser: RA expression → efficient evaluation plan
... Query Optimisation | 45/154 |
Query optimisation is a critical step in query evaluation.
The query optimiser
"Optimisation" is necessary because there can be enormous differences in costs between different plans for evaluating a given RA expression.
E.g. tmp ← A × B; res ← σx(tmp) vs res ← A ⋈x B
Query Evaluation Example | 46/154 |
Example database schema (multi-campus university):
Employee(eid, ename, status, city) Department(dname, city, address) Subject(sid, sname, syllabus) Lecture(subj, dept, empl, time)
Example database instance statistics:
r | R | C | b | |||||
Employee |
1000 | 100 | 10 | 100 | ||||
Department |
100 | 200 | 5 | 20 | ||||
Subject |
500 | 95 | 10 | 100 | ||||
Lecture |
2000 | 100 | 10 | 200 |
... Query Evaluation Example | 47/154 |
Query: Which depts in Sydney offer database subjects?
select dname from Department D, Subject S, Lecture L where D.city = 'Sydney' and S.sname like '%Database%' and D.dname = L.dept and S.sid = L.subj
Additional information (needed to determine query costs):
... Query Evaluation Example | 48/154 |
Consider total I/O costs for five evaluation strategies.
Assumptions in computing costs:
... Query Evaluation Example | 49/154 |
Strategy #1 for answering query:
check = (city='Sydney' & sname='Databases' & dname=dept & sid=subj)
... Query Evaluation Example | 50/154 |
Costs involved in using strategy #1:
Reln | r | R | C | b |
TMP1 |
1000000 | 195 | 5 | 20000 |
TMP2 |
100000000 | 395 | 2 | 50000000 |
TMP3 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+20000) + (20+20*20000+5*106) + (5*106+2) + 2
= 100,440,124
... Query Evaluation Example | 51/154 |
Strategy #2 for answering query:
... Query Evaluation Example | 52/154 |
Costs involved in using strategy #2:
Reln | r | R | C | b |
TMP1 |
2000 | 195 | 5 | 400 |
TMP2 |
2000 | 395 | 2 | 1000 |
TMP3 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+400) + (20+20*400+1000) + (1000+2) + 2
= 30,524
... Query Evaluation Example | 53/154 |
Strategy #3 for answering query:
... Query Evaluation Example | 54/154 |
Costs involved in using strategy #3:
Reln | r | R | C | b |
TMP1 |
2000 | 195 | 5 | 400 |
TMP2 |
20 | 195 | 5 | 4 |
TMP3 |
20 | 395 | 2 | 10 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4 + CostStep5
= (100+100*200+400) + (400+4) + (4+4*20+10) + (10+2) + 1
= 21,011
... Query Evaluation Example | 55/154 |
Strategy #4 for answering query:
... Query Evaluation Example | 56/154 |
Costs involved in using strategy #4:
Reln | r | R | C | b |
TMP1 |
80 | 95 | 5 | 16 |
TMP2 |
5 | 200 | 5 | 1 |
TMP3 |
300 | 195 | 5 | 60 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
... Query Evaluation Example | 57/154 |
Strategy #5 for answering query:
... Query Evaluation Example | 58/154 |
Costs involved in using strategy #5:
Reln | r | R | C | b |
TMP1 |
80 | 95 | 5 | 16 |
TMP2 |
5 | 200 | 5 | 1 |
TMP3 |
100 | 295 | 3 | 34 |
TMP4 |
3 | 395 | 2 | 2 |
RESULT |
3 | 100 | 10 | 1 |
Query Optimisation Problem | 59/154 |
Given:
... Query Optimisation Problem | 60/154 |
Why do we not generate optimal query execution plans?
Finding an optimal query plan ...
Cost Models and Analysis | 61/154 |
The cost of evaluating a query is determined by:
Approaches to Optimisation | 62/154 |
Three main classes of techniques developed:
Real query optimisers use a combination of algrebraic+physical.
Semantic QO is good idea, but expensive/difficult to implement.
Optimisation Process | 63/154 |
Start with RA tree representation of query, then ...
... Optimisation Process | 64/154 |
Example of optimisation transformations:
For join, may also consider sort/merge join and hash join.
Algebraic Optimisation | 65/154 |
Make use of algebraic equivalences:
Rationale: minimises size of intermediate relations.
Can potentially be done during the SQL→RA mapping phase.
Algebraic optimisation cannot assist with finding good join order.
Physical Optimisation | 66/154 |
Makes use of execution cost analysis:
Semantic Optimisation | 67/154 |
Make use of application-specific properties:
Basis: exploit meta-data and other semantic info about relations.
(E.g. this field is a primary key, so we know there'll only be one matching tuple)
Stages in Algebraic Optimisation | 68/154 |
Start with relational algebra (RA) expression.
Standardisation | 69/154 |
Many query optimisers assume RA expression is in a "standard form".
E.g. select conditions may be stated in
(A1 AND ... AND An) OR ... OR (Z1 AND ... AND Zm)
(A1 OR ... OR An) AND ... AND (Z1 OR ... OR Zm)
Simplification | 70/154 |
Main aim of simplification is to reduce redundancy.
Makes use of tranformation rules based on laws of boolean algebra.
A AND B |
↔ | B AND A |
A OR (B AND C) |
↔ | (A OR B) AND (A OR C) |
A AND A |
↔ | A |
A OR NOT(A) |
↔ | True |
A AND NOT(A) |
↔ | False |
A OR (A AND B) |
↔ | A |
NOT(A AND B) |
↔ | NOT(A) OR NOT(B) |
NOT(NOT(A)) |
↔ | A |
(Programmers don't produce redundant expressions; but much SQL is produced by programs)
Simplification Example | 71/154 |
Consider the query:
select Title from Employee where (not (Title = 'Programmer') and (Title = 'Programmer' or Title = 'Analyst') and not (Title = 'Analyst')) or Name = 'Joe'
Denote the atomic conditions as follows:
(Title = 'Programmer')
(Title = 'Analyst')
(Name = 'Joe')
... Simplification Example | 72/154 |
Selection condition can then be simplified via:
Cond | = | ( |
= | (( |
|
= | ( |
|
= | False ∨ J | |
= | J |
Giving the equivalent simplified query:
select Title from Employee where Name = "Joe"
Amelioration | 73/154 |
Aim of amelioration is to improve efficiency.
Transform query to equivalent form which is known to be more efficient.
Achieve this via:
Amelioration Process | 74/154 |
Amelioration Example | 75/154 |
Consider school information database:
CSG(Course, StudentId, grade) SNAP(StudentId, name, address, phone) CDH(Course, Day, Hour) CR(Course, Room)
And the query:
... Amelioration Example | 76/154 |
An obvious translation of the query into SQL
select Room from CSG,SNAP,CDH,CR where name='Brown' and day='Mon' and hour='9am'
and an obvious mapping of the SQL into relational algebra gives
Room
CSG
SNAP
CDH
CR
name='Brown'
Day='Mon'
Hour='9am'
... Amelioration Example | 77/154 |
Room
CSG
SNAP
CDH
CR
Push selection:
Room
CSG
SNAP
CDH
CR
Split selection:
Room
CSG
SNAP
CDH
CR
Push two selections:
Room
CSG
SNAP
CDH
CR
Push selection:
Room
CSG
SNAP
CDH
CR
Can we show that the final version is more efficient?
... Amelioration Example | 78/154 |
Could use an argument based on size of intermediate results, as above.
But run into problems with expressions like:
CSG
SNAP
CDH
CR
Order of joins can make a significant difference?
How to decide "best" order?
Need understanding of physical characteristics of relations
(for example, selectivity and likelihood of join matches across tables)
Physical (Cost-Based) Optimisation | 79/154 |
Need to determine:
Do this for all possible execution plans.
Choose cheapest plan.
Execution Plans | 80/154 |
Consider query execution plans for the RA expression:
Plan #1
tmp1 := HashJoin[d](R,S) tmp2 := SortMergeJoin[e](tmp1,T) result := BinarySearch[c](tmp2)
... Execution Plans | 81/154 |
Plan #2
tmp1 := SortMergeJoin[e](S,T) tmp2 := HashJoin[d](R,tmp1) result := LinearSearch[c](tmp2)
Plan #3
tmp1 := BtreeSearch[c](R) tmp2 := HashJoin[d](tmp1,S) result := SortMergeJoin[e](tmp2)
All plans produce same result, but have quite different costs.
Cost-based Query Optimiser | 82/154 |
Overview of cost-based query optimisation process:
... Cost-based Query Optimiser | 83/154 |
Approximate algorithm for cost-based optimisation:
translate SQL query to RAexp for all transformations RA' of RAexp { for each node e of RA' (recursively) { select access method for e plan[i++] = access method for e } cost = 0 for each op in plan[] { cost += Cost(op) } if (cost < MinCost) { MinCost = cost BestPlan = plan } }
As noted above, we do not consider *all* transformations.
Heuristics: push selections down, consider only left-deep join trees.
Choosing Access Methods | 84/154 |
Inputs:
... Choosing Access Methods | 85/154 |
Example:
Student
name
tmp[i] := BtreeSearch[name='John'](Student) tmp[i+1] := LinearSearch[age>21](tmp[i])
Where possible, use pipelining to avoid storing tmp[i]
... Choosing Access Methods | 86/154 |
Rules for choosing σ access methods:
R
A
indexSearch[A=c](R)
R
A
hashSearch[A=c](R)
R
A
binarySearch[A=c](R)
R
A
indexSearch[A=c+1](R)
R
A
linearSearch[A>=c](R)
... Choosing Access Methods | 87/154 |
Rules for choosing ⋈ access methods:
R
bnlJoin(R,S)
R
S
smJoin(R,S)
R
inlJoin(S,R)
hashJoin(R,S)
bnl
inl
sm
Example Plan Enumeration | 88/154 |
Consider the query:
select name from Student s, Enrol e where s.sid = e.sid AND e.cid = 'COMP9315';
where
Relation | r | b | PrimKey | Clustered | Indexes |
Student |
10000 | 500 | sid |
No | B-tree on sid |
Enrol |
40000 | 400 | cid |
Yes | B-tree on cid |
S E |
40000 | 800 | sid,cid |
No | - |
... Example Plan Enumeration | 89/154 |
First step is to map SQL to RA expression
Student
Enrol
then devise an execution plan based on this expression
Student.sid
Enrol.cid
tmp1 := inlJoin[s.sid=e.sid](Enrol,Student) result := LinearSearch[e.id='COMP9315'](tmp1)
... Example Plan Enumeration | 90/154 |
Estimated cost for this plan:
Cost | = | inl-join on (Enrol,Student tmp1 |
= | Tr(bS + rS.btreeE) + Twbtmp1 + Trbtmp1 | |
= | 500 + 10,000.4 + 800 + 800 | |
= | 42,100 |
Notes:
tmp1
Enrol
... Example Plan Enumeration | 91/154 |
Apply rule for pushing select into join:
and devise a plan based on
Enrol.cid
tmp1 := BtreeSearch[e.cid='COMP9315'](Enrol) result := bnlJoin[s.sid=e.sid](tmp1,Student)
... Example Plan Enumeration | 92/154 |
Estimated cost for this plan:
Cost | = | btree-search on Enrol tmp1,Student |
= | search+scan on Enrol |
|
= | 3Tr + 3Tr + Tr(1 + 500) | |
= | 3 + 3 + 1 + 500 = 507 |
Notes:
Enrol
Cost Estimation | 93/154 |
Without actually executing it, cannot always know the cost of plan precisely.
(E.g. how big is the select result that feeds into the join?)
Thus, query optimisers need to estimate costs.
Two aspects to cost estimation:
Cost Estimation Information | 94/154 |
Cost estimation uses statistical information about relations:
rS | cardinality of relation S |
RS | avg size of tuple in relation S |
V(A,S) | # distinct values of attribute A |
min(A,S) | min value of attribute A |
max(A,S) | max value of attribute A |
Estimating Projection Result Size | 95/154 |
Straightforward, since we know:
If pipelining through buffers of size B, then
#buffers-full = rS × ⌈B/Rout⌉
Estimating Selection Result Size | 96/154 |
Selectivity factor = fraction of tuples expected to satisfy a condition.
Common assumption: attribute values uniformly distributed.
Example: Consider the query
select * from Parts where colour='Red'
and assume that there are only four possible colours.
If we have 1000 Parts
Red
In other words,
In general, |σA=c(R)| ≅ rR/V(A,R)
... Estimating Selection Result Size | 97/154 |
Estimating size of result for e.g.
select * from Enrolment where year > 2003;
Could estimate by using:
Simpler heuristic used by some systems: selected tuples = r/3
... Estimating Selection Result Size | 98/154 |
Estimating size of result for e.g.
select * from Enrolment where course <> 'COMP9315';
Could estimate by using:
... Estimating Selection Result Size | 99/154 |
Uniform distribution assumption is convenient, but often not realistic.
How to handle non-uniform attribute value distributions?
White
Red
Blue
Silver
Use histogram as basis for determining # selected tuples.
Disadvantage: cost of storing/maintaining histograms.
Estimating Join Result Size | 100/154 |
Analysis is not as simple as select.
Relies on semantic knowledge about data/relations.
Consider equijoin on common attr: R ⋈A S
Case 1: |R ⋈A S| = 0
Useful if we know that dom(A,R) ∩ dom(A,S) = {}
Case 2: A is unique key in both R
S
⇒ can't be more tuples in join than in smaller
of R
S
... Estimating Join Result Size | 101/154 |
Case 3: A is primary key in R
S
⇒ every S
R
Example:
select name from Students s, Enrol e where e.cid = 'COMP9315' AND e.sid = s.sid
Cost Estimation: Postscript | 102/154 |
Inaccurate cost estimation can lead to poor evaluation plans.
Above methods can (sometimes) give inaccurate estimates.
To get more accurate estimates costs:
Trade-off between optimiser performance and query performance.
Semantic Query Optimisation (SQO) | 103/154 |
Makes use of semantics of data to assist query optimisation process.
The discussion of join cost estimation above gives an example of this.
Kinds of information used:
Two examples of semantic transformation operations:
Semantic Equivalence | 104/154 |
Semantic equivalence does not require syntactic equivalence.
Consider the two queries:
select * from Emp where sal > 80K select * from Emp where sal > 80K and job = 'Prof'
They are equivalent under the semantic rule:
Emp.job
'Prof'
Emp.sal > 80K
(i.e. Profs are the only people earning more than $80,000)
Could use this to transform query to exploit index on salary
Restriction Elimination | 105/154 |
Query:
select dept from Storage where material = 'Benzene' and qty > 400
Use rule: material
'Benzene'
qty > 500
select dept from Storage where material = 'Benzene'
Result: Unnecessary restriction on the attribute qty
Index Introduction | 106/154 |
Query: Find all the employees who make more than 42K
select name from Employees where salary > 42K
Use rule: salary > 42K
job
'manager'
select name from Employees where salary > 42K and job = 'manager'
Result: A new constraint is obtained on the
indexed attribute job
Query Execution |
Query Execution | 108/154 |
Query execution: applies evaluation plan → set of result tuples
... Query Execution | 109/154 |
Query execution
... Query Execution | 110/154 |
Example of query translation:
select s.name, s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2';
maps to
maps to
Temp1 = BtreeSelect[semester=05s2](Enr) Temp2 = HashJoin[e.student=s.id](Stu,Temp1) Result = Project[name,id,course,mark](Temp2)
... Query Execution | 111/154 |
A query execution plan:
... Query Execution | 112/154 |
Two ways of communicating results between query blocks ...
Materialization
Materialization | 113/154 |
Steps in materialization between two operators
... Materialization | 114/154 |
Example:
select s.name, s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2' and s.name = 'John';
might be executed as
Temp1 = BtreeSelect[semester=05s2](Enrolment) Temp2 = BtreeSelect[name=John](Student)-- indexes on name and semester -- produce sorted Temp1 and Temp2 Temp3 = SortMergeJoin[e.student=s.id](Temp1,Temp2)-- SMJoin especially effective, since -- Temp1 and Temp2 are already sorted Result = Project[name,id,course,mark](Temp3)
Pipelining | 115/154 |
How pipelining is organised between two operators:
Iterators (reminder) | 116/154 |
Iterators provide a "stream" of results:
iter = open(
)
tuple = next(iter)
close(iter)
... Iterators (reminder) | 117/154 |
Implementation of single-relation selection iterator:
typedef struct { File inf;// input file Cond cond;// selection condition Buffer buf;// buffer holding current page int curp;// current page during scan int curr;// index of current record in page } Iterator;
Iterator structure contains information:
cond
curp, curr
... Iterators (reminder) | 118/154 |
Implementation of single-relation selection iterator (cont):
Iterator *open(char *relName, Condition cond) { Iterator *iter = malloc(sizeof(Iterator)); iter->inf = openFile(fileName(relName),READ); iter->cond = cond; iter->curp = 0; iter->curr = -1; readBlock(iter->inf, iter->curp, iter->buf); return iter; } void close(Iterator *iter) { closeFile(iter->inf); free(iter); }
... Iterators (reminder) | 119/154 |
Implementation of single-relation selection iterator (cont):
Tuple next(Iterator *iter) { Tuple rec; do {// check if reached end of current page if (iter->curr == nRecs(iter->buf)-1) {// check if reached end of data file if (iter->curp == nBlocks(iter->inf)-1) return NULL; iter->curp++; iter->buf = readBlock(iter->inf, iter->curp); iter->curr = -1; } iter->curr++; rec = getRec(iter->buf, iter->curr); } while (!matches(rec, iter->cond)); return rec; }// curp and curr hold indexes of most recently read page/record
Pipelining Example | 120/154 |
Consider the query:
select s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2' and s.name = 'John';
which maps to the RA expression
which could represented by the RA expression tree
... Pipelining Example | 121/154 |
Modelled as communication between RA tree nodes:
... Pipelining Example | 122/154 |
This query might be executed as
System: iter0 = open(Result) while (Tup = next(iter0)) { display Tup } close(iter0) Result: iter1 = open(Join) while (T = next(iter1)) { T' = project(T); return T' } close(iter1) Sel1: iter4 = open(Btree(Enrolment,'semester=05s2')) while (A = next(iter4)) { return A } close(iter4) ...
... Pipelining Example | 123/154 |
... Join:-- nested-loop join iter2 = open(Sel1) while (R = next(iter2) { iter3 = open(Sel2) while (S = next(iter3)) { if (matches(R,S) return (RS) } close(iter3)// better to reset(iter3) } close(iter2) Sel2: iter5 = open(Btree(Student,'name=John')) while (B = next(iter5)) { return B } close(iter5)
Pipeline Execution | 124/154 |
Piplines can be executed as ...
Demand-driven
In parallel-processing systems, iterators could run concurrently.
Disk Accesses | 125/154 |
Pipelining cannot avoid all disk accesses.
Some operations use multiple passes (e.g. merge-sort, hash-join).
... Disk Accesses | 126/154 |
Sophisticated query optimisers might realise e.g.
In this case, it could materialize X's output in an S-file.
Produces a pipeline/materialization hybrid query execution.
Example:
... Disk Accesses | 127/154 |
Example: (pipeline/materialization hybrid)
select s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2' and s.name = 'John';
might be executed as
System: exec(Sel2)-- creates Temp1 iter0 = open(Result) while (Tup = next(iter0)) { display Tup } close(iter0) Result: iter1 = open(Join) while (T = next(iter1)) { T' = project(T); return T' } close(iter1) ...
... Disk Accesses | 128/154 |
... Join:-- index join iter2 = open(Sel1) while (R = next(iter2) { iter3 = open(Btree(Temp1,'id=R.student')) while (S = next(iter3)) { return (RS) } close(iter3) } close(iter2) Sel1: iter4 = open(Btree(Enrolment,'semester=05s2')) while (A = next(iter4)) { return A } close(iter4) Sel2: iter5 = open(Btree(Student,'name=John')) createBtree(Temp1,'id') while (B = next(iter5)) { insert(B,Temp1) } close(iter5)
PostgreSQL Execution | 129/154 |
Defs: src/include/executor
src/include/nodes
Code: src/backend/executor
PostgreSQL uses pipelining ...
Plan
Scan
Group
Indexscan
Sort
HashJoin
PlanState
... PostgreSQL Execution | 130/154 |
Modules in src/backend/executor
execXXX
nodeXXX
ExecInitXXX
ExecXXX
ExecEndXXX
... PostgreSQL Execution | 131/154 |
Top-level data/functions for executor ...
QueryDesc
ExecutorStart(QueryDesc *, ...)
ExecutorRun(QueryDesc *, ScanDirection, ...)
ExecutorEnd(QueryDesc *)
... PostgreSQL Execution | 132/154 |
Overview of query processing:
CreateQueryDesc ExecutorStart CreateExecutorState-- creates per-query context switch to per-query context to run ExecInitNode ExecInitNode-- recursively scans plan tree CreateExprContext-- creates per-tuple context ExecInitExpr ExecutorRun ExecutePlan-- invoke iterators from root ExecProcNode-- recursively called in per-query context ExecEvalExpr-- called in per-tuple context ResetExprContext-- to free memory ExecutorEnd ExecEndNode-- recursively releases resources FreeExecutorState-- frees per-query and child contexts FreeQueryDesc
... PostgreSQL Execution | 133/154 |
More detailed view of plan execution (but still much simplified)
ExecutePlan(execState, planStateNode, ...) { process "before each statement" triggers for (;;) { tuple = ExecProcNode(planStateNode) check tuple validity// MVCC if (got a tuple) break } process "after each statement" triggers return tuple } ExecProcNode(node) { switch (nodeType(node)) { case SeqScan: result = ExecSeqScan(node); break; case NestLoop: result = ExecNestLoop(node); break; ... } return result; }
... PostgreSQL Execution | 134/154 |
Generic iterator interface is provided by ...
ExecInitNode
ExecProcNode
ExecEndNode
Example PostgreSQL Execution | 135/154 |
Consider the query:
-- get manager's age and # employees in Shoe department select e.age, d.nemps from Departments d, Employees e where e.name = d.manager and d.name ='Shoe'
and its execution plan tree
... Example PostgreSQL Execution | 136/154 |
This produces a tree with three nodes:
NestedLoop
IndexScan
SeqScan
... Example PostgreSQL Execution | 137/154 |
Initially InitPlan() invokes ExecInitNode() on plan tree root.
ExecInitNode() sees a NestedLoop node ... so dispatches to ExecInitNestLoop() to set up iterator and then invokes ExecInitNode() on left and right sub-plans in left subPlan, ExecInitNode() sees an IndexScan node so dispatches to ExecInitIndexScan() to set up iterator in right sub-plan, ExecInitNode() sees aSeqScan node so dispatches to ExecInitSeqScan() to set up iterator
Result: a plan state tree with same structure as plan tree.
... Example PostgreSQL Execution | 138/154 |
Execution: ExecutePlan() repeatedly invokes ExecProcNode().
ExecProcNode() sees a NestedLoop node ... so dispatches to ExecNestedLoop() to get next tuple which invokes ExecProcNode() on its sub-plans in the left sub-plan, ExecProcNode() sees an IndexScan node so dispatches to ExecIndexScan() to get next tuple if no more tuples, return END for this tuple, invoke ExecProcNode() on right sub-plan ExecProcNode() sees a SeqScan node so dispatches to ExecSeqScan() to get next tuple check for match and return joined tuples if found reset right sub-plan iterator
Result: stream of result tuples returned via ExecutePlan()
Performance Tuning |
Performance Tuning | 140/154 |
Schema design:
... Performance Tuning | 141/154 |
Tuning requires us to consider the following:
... Performance Tuning | 142/154 |
Performance can be considered at two times:
Denormalisation | 143/154 |
Normalisation minimises storage redundancy.
... Denormalisation | 144/154 |
Example ... consider the following normalised schema:
create table Subject ( id serial primary key, code char(8),-- e.g. COMP9315 title varchar(60), syllabus text, ... ); create table Term ( id serial primary key, name char(4),-- e.g. 09s2 starting date, ... ); create table Course ( subject integer references Subject(id), term integer references Term(id), lic integer references Staff(id), ... );
... Denormalisation | 145/154 |
Example: Courses = Course ⋈ Subject ⋈ Term
If we often need to refer to "standard" name (e.g. COMP9315 09s2)
courseName
Course
Course
Course
-- can now replace a query like: select s.code||' '||t.name, avg(e.mark) from Course c, Subject s, Term t where c.subject = s.id and c.term = t.id and s.code='COMP9315' and t.name='09s2'-- by a query like: select c.courseName, e.grade, e.mark from Course c where c.courseName = 'COMP9315 09s2'
Indexes | 146/154 |
Indexes provide efficient content-based access to tuples.
Can build indexes on any (combination of) attributes.
Definining indexes:
CREATE INDEX name ON table ( attr1, attr2, ... )
attri can be an arbitrary expression (e.g. upper(name)
CREATE INDEX
UNIQUE
USING
... Indexes | 147/154 |
Indexes can significantly improve query costs.
Considerations in applying indexes:
-- use hashing for (unique) attributes in equality tests, e.g. select * from Employee where id = 12345-- use B-tree for attributes in range tests, e.g. select * from Employee where age > 60
Query Tuning | 148/154 |
Sometimes, a query can be re-phrased to affect performance:
select name from Employee where salary/365 > 100-- fix by re-phrasing condition to (salary > 36500) select name from Employee where name like '%ith%' select name from Employee where birthday is null-- above two are difficult to "fix" select name from Employee where dept in (select id from Dept where ...)-- fix by using Employee join Dept on (e.dept=d.id)
... Query Tuning | 149/154 |
Other factors to consider in query tuning:
select distinct
distinct
select ... Employee join Customer on (s.name = p.name)vs select ... Employee join Customer on (s.ssn = p.ssn)
or
or
or
select name from Employee where dept=1 or dept=2vs (select name from Employee where dept=1) union (select name from Employee where dept=2)
PostgreSQL Query Tuning | 150/154 |
PostgreSQL provides the explain
EXPLAIN [ANALYZE] Query
Without ANALYZE
EXPLAIN
With ANALYZE
EXPLAIN
Note that runtimes may show considerable variation due to buffering.
EXPLAIN Examples | 151/154 |
Example: Select on indexed attribute
ass2=# explain select * from Students where id=100250; QUERY PLAN ----------------------------------------------------------------- Index Scan using student_pkey on student (cost=0.00..5.94 rows=1 width=17) Index Cond: (id = 100250) ass2=# explain analyze select * from Students where id=100250; QUERY PLAN ----------------------------------------------------------------- Index Scan using student_pkey on student (cost=0.00..5.94 rows=1 width=17) (actual time=31.209..31.212 rows=1 loops=1) Index Cond: (id = 100250) Total runtime: 31.252 ms
... EXPLAIN Examples | 152/154 |
Example: Select on non-indexed attribute
ass2=# explain select * from Students where stype='local'; QUERY PLAN ---------------------------------------------------------- Seq Scan on student (cost=0.00..70.33 rows=18 width=17) Filter: ((stype)::text = 'local'::text) ass2=# explain analyze select * from Students ass2-# where stype='local'; QUERY PLAN --------------------------------------------------------------- Seq Scan on student (cost=0.00..70.33 rows=18 width=17) (actual time=0.061..4.784 rows=2512 loops=1) Filter: ((stype)::text = 'local'::text) Total runtime: 7.554 ms
... EXPLAIN Examples | 153/154 |
Example: Join on a primary key (indexed) attribute
ass2=# explain ass2-# select s.sid,p.name ass2-# from Students s, People p where s.id=p.id; QUERY PLAN ----------------------------------------------------------- Hash Join (cost=70.33..305.86 rows=3626 width=52) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on person p (cost=0.00..153.01 rows=3701 width=52) -> Hash (cost=61.26..61.26 rows=3626 width=8) -> Seq Scan on student s (cost=0.00..61.26 rows=3626 width=8)
... EXPLAIN Examples | 154/154 |
Example: Join on a non-indexed attribute
ass3=> explain select s1.code, s2.code ass2-# from Subjects s1, Subjects s2 where s1.offerer=s2.offerer; QUERY PLAN ---------------------------------------------------------------- Merge Join (cost=2744.12..18397.14 rows=1100342 width=18) Merge Cond: (s1.offerer = s2.offerer) -> Sort (cost=1372.06..1398.33 rows=10509 width=13) Sort Key: s1.offerer -> Seq Scan on subjects s1 (cost=0.00..670.09 rows=10509 width=13) -> Sort (cost=1372.06..1398.33 rows=10509 width=13) Sort Key: s2.offerer -> Seq Scan on subjects s2 (cost=0.00..670.09 rows=10509 width=13) (8 rows)
Produced: 28 Jul 2019