Query Processing


Query Processing Overview

(Translation, Optimisation, Execution)


Query Evaluation2/154

[Diagram:Pics/qproc/dbmsarch-small.png]


... Query Evaluation3/154

The most common form of interaction with a DBMS involves:

[Diagram:Pics/qproc/qryeval0-small.png]


Overall approach clearly applies to select ... but also to delete and update.


... Query Evaluation4/154

A query in SQL:

A query evaluator/processor : Typically: translate plan execute ... in a pipeline.

Some DBMSs can save query plans for later re-use
(because the planning step is potentially quite expensive).


... Query Evaluation5/154

Internals of the query evaluation "black-box":

[Diagram:Pics/qproc/qproc0-small.png]


... Query Evaluation6/154

Three phases of query evaluation:

  1. parsing/compilation
    - input: SQL query, catalog
    - using: parsing techniques, mapping rules
    - output: relational algebra (RA) expression
  2. query optimisation
    - input: RA expression, DB statistics
    - using: cost models, search strategy
    - output: query execution plan   (DB engine ops)
  3. query execution
    - input: query execution plan
    - using: database engine
    - output: tuples that satisfy query
We ignore the "display" step; simply maps result tuples into appropriate format


Intermediate Representations7/154

SQL query text is not easy to manipulate/transform.

Need a query representation formalism that ...

Relational algebra (RA) expressions: Thus, RA typically used as "target language" for SQL compilation.


... Intermediate Representations8/154

In principle, could base RA engine on { σ, π, ∪, -, ×}   (completeness).

In practice, having only these operators makes execution inefficient.

The standard set of RA operations in a DBMS includes:

Other operations typically provided in extended RA engines: RA rename operator is "hidden" in table/tuple representation.


... Intermediate Representations9/154

In practice, DBMSs provide several versions of each RA operation.

For example:

Similarly, π and have versions to match specific query types.


... Intermediate Representations10/154

We call these specialised version of RA operations RelOps.

One major task of the query processor:

Requires the query translator/optimiser to consider RelOps are realised at execution time


... Intermediate Representations11/154

Terminology variations ...

Relational algebra expression of SQL query

Execution plan as collection of RelOps


Query Translation


... Intermediate Representations13/154

Query translation:   SQL statement text RA expression

[Diagram:Pics/qproc/qproc1-small.png]


Query Translation14/154

Translation step is a mapping

Standard internal representation is relational algebra (RA).

Mapping from SQL to RA may including some optimisations.

Mapping processes:  lexer/parser,  mapping rules,  rewriting rules.

Example:

SQL: select name from Students where id=7654321;
-- is translated to
RA:  Proj[name](Sel[id=7654321]Students)


Parsing SQL15/154

Parsing task is similar to that for programming languages.

Language elements:

One difference to parsing PLs:

Therefore, SQL parser needs access to catalog for definitions.


... Parsing SQL16/154

PostgreSQL dialect of SQL ...


Catalog and Parsing17/154

Consider the following simple University schema:

Staff(id, name, position, office, phone, ...)
Students(id, name, birthday, degree, avg, ...)
Subjects(id, title, prereqs, syllabus, ...)
Enrolments(sid, subj, session, mark, ...)

DBMS catalog stores following kinds of information:


... Catalog and Parsing18/154

The schema/type information is used for

The statistical information is used for Examples:


Query Blocks19/154

A query block is an SQL query with

Query optimisers typically deal with one query block at a time ...

SQL compilers need to decompose queries into blocks

Interesting kinds of blocks:


... Query Blocks20/154

Consider the following example query:

SELECT s.name FROM Students s
WHERE  s.avg = (SELECT MAX(avg) FROM Students)

which consists of two blocks:

Block1: SELECT MAX(avg) FROM Students
Block2: SELECT s.name FROM Students s
        WHERE  s.avg = <<Block1>>>

Query processor arranges execution of each block separately and transfers result of Block1 to Block2.


Mapping SQL to Relational Algebra21/154

A naive query compiler might use the following translation scheme:

Example:

SELECT s.name, e.subj
FROM   Students s, Enrolments e
WHERE  s.id = e.sid AND e.mark > 50;

is translated to

πs.name,e.subj( σs.id=e.sid ∧ e.mark>50 ( Students × Enrolments ) )


... Mapping SQL to Relational Algebra22/154

A better translation scheme would be something like:

Example:

SELECT s.name, e.subj
FROM   Students s, Enrolments e
WHERE  s.id = e.sid AND e.mark > 50;

is translated to

πs.name,e.subj( σe.mark>50 ( Students ⋈s.id=e.sid Enrolments ) ) )


... Mapping SQL to Relational Algebra23/154

In fact, many SQL compilers ...

This is one instance of a general query translation process Aim of rewriting: convert a given RA expression (Rewriting is discussed later)


Mapping Rules24/154

Mapping from an SQL query to an RA expression requires:

May need to use multiple templates to map whole SQL statement.


... Mapping Rules25/154

Projection:

SELECT   f1, f2, ... fn   FROM   ...

⇒    Project[f1, f2, ... fn](...)

SQL projection extends RA projection with renaming and assignment:

SELECT   a+b AS x, c AS y   FROM   R ...

⇒    Project[x ← a+b, y ← c](R)


... Mapping Rules26/154

Join:   (e.g. on R(a,b,c,d) and S(c,d,e))

SELECT  ...  FROM  ... R, S ...  WHERE  ... R.a  op  S.e ... ,    or

SELECT  ...  FROM  ... R  JOIN  S  ON  (R.a  op  S.e) ...  WHERE  ...

⇒    Join[R.a op S.e](R,S)

SELECT  ...  FROM  ... R  NATURAL JOIN  S,    or

SELECT  ...  FROM  ... R  JOIN  S  USING  (c,d) ...  WHERE  ...

⇒    Proj[a,b,e](Join[R.c=S.c ∧ R.d=S.d](R,S))


... Mapping Rules27/154

Selection:

SELECT  ...  FROM  ...  R  ...  WHERE  ... R.f  op  val ...

⇒    Select[R.f op val](R)

SELECT  ...  FROM  ... R ...  WHERE  ... Cond1,R  AND  Cond2,R ...

⇒    Select[Cond1,R ∧ Cond2,R](R)
or
⇒    Select[Cond1,R](Select[Cond2,R](R))
or
⇒    Select[Cond2,R](Select[Cond1,R](R))


... Mapping Rules28/154

Aggregation operators (e.g. MAX, SUM, ...):


... Mapping Rules29/154

Sorting (ORDER BY):

Duplicate elimination (DISTINCT): Grouping (GROUP BY, HAVING):


... Mapping Rules30/154

View definitions produce:

Example: assuming Employee(id,name,birthdate,salary)

create view OldEmps as
select * from Employees
where birthdate < '01-01-1960';

yields

OldEmps  =  Select[birthdate<'01-01-1960'](Employees)


... Mapping Rules31/154

General case:

CREATE VIEW  V   AS   SQLstatement

⇒    V  =  mappingOf(SQLstatement)


Special case: views with attribute renaming:

CREATE VIEW  V(a,b,c)  AS   SELECT  h,j,k  FROM  R  WHERE  C  ...

⇒    V  =  Proj[a ← h, b ← j, c ← k](Select[C](R))


... Mapping Rules32/154

Views used in queries:

Example:

select name from OldEmps;   -- using OldEmps as defined above

⇒    Projname(mappingOf(OldEmps))

⇒    Projname(Select[birthdate<'01-01-1960'](Employees))


Mapping Example33/154

The query

List the names of all subjects with more than 100 students in them

can be expressed in SQL as

select   distinct s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id
having   count(*) > 100;


... Mapping Example34/154

In the SQL compiler, the query

select   distinct s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id
having   count(*) > 100;

might be translated to the relational algebra expression

Uniq(Project[code](
    GroupSelect[groupSize>100](
        GroupBy[id] (
            Enrolment ⋈ Course ⋈ Subjects
))))


... Mapping Example35/154

The join operations could be done in two different ways:

[Diagram:Pics/qproc/join-trees-small.png]


The query optimiser determines which has lower cost.

Note: for a join involving n tables, there are O(n!) possible trees.


Expression Rewriting Rules36/154

Since RA is a well-defined formal system

Expression transformation based on such rules can be used


Relational Algebra Laws37/154

Commutative and Associative Laws:

But it is not true in general that Example:   R(a,b),   S(b,c),   T(c,d),   (R Join[R.b>S.b] S) Join[a<d] T

Cannot rewrite as R Join[R.b>S.b] (S Join[a<d] T) because neither S.a nor T.a exists.


... Relational Algebra Laws38/154

Selection commutativity   (where c is a condition):

Selection splitting (where c and d are conditions): Selection pushing   ( σc(R op S) ):


... Relational Algebra Laws39/154

Selection pushing with join, cross-product and intersection ...

If c refers only to attributes from R:

If c refers only to attributes from S: If c refers to attributes from both R and S:


... Relational Algebra Laws40/154

Rewrite rules for projection ...

All but last projection can be ignored:

Projections can be pushed into joins:

where


Mapping Subqueries41/154

Two varieties of sub-query:    (sample schema R(a,b), S(c,d))

Standard strategy: convert to a join.


... Mapping Subqueries42/154

Example of mapping independent subquery ...

select * from R
where a in (select c from S where d>5)

is mapped as

⇒   Sel[a in Proj[c](Sel[d>5]S)](R)

⇒   Sel[a=c](R × Proj[c](Sel[d>5](S)))

⇒   R  Join[a=c]  Proj[c](Sel[d>5](S))


Query Optimisation


Query Optimisation44/154

Query optimiser:   RA expression efficient evaluation plan

[Diagram:Pics/qproc/qproc2-small.png]


... Query Optimisation45/154

Query optimisation is a critical step in query evaluation.

The query optimiser

"Optimisation" is necessary because there can be enormous differences in costs between different plans for evaluating a given RA expression.

E.g.   tmp ← A × B;   res ← σx(tmp)     vs     res ← A ⋈x B


Query Evaluation Example46/154

Example database schema (multi-campus university):

Employee(eid, ename, status, city)
Department(dname, city, address)
Subject(sid, sname, syllabus)
Lecture(subj, dept, empl, time)

Example database instance statistics:

Relation r R C b
Employee 1000 100 10 100
Department 100 200 5 20
Subject 500 95 10 100
Lecture 2000 100 10 200


... Query Evaluation Example47/154

Query: Which depts in Sydney offer database subjects?


select dname
from   Department D, Subject S, Lecture L
where  D.city = 'Sydney' and S.sname like '%Database%'
       and D.dname = L.dept and S.sid = L.subj

Additional information (needed to determine query costs):


... Query Evaluation Example48/154

Consider total I/O costs for five evaluation strategies.

Assumptions in computing costs:

These are worst-case scenarios and could be improved by


... Query Evaluation Example49/154

Strategy #1 for answering query:

  1. TMP1    ←    Subject × Lecture
  2. TMP2    ←    TMP1 × Department
  3. TMP3    ←    Select[check](TMP2)

    check = (city='Sydney' & sname='Databases' & dname=dept & sid=subj)

  4. RESULT    ←   Project[dname](TMP3)


... Query Evaluation Example50/154

Costs involved in using strategy #1:

Reln r R C b
TMP1 1000000 195 5 20000
TMP2 100000000 395 2 50000000
TMP3 3 395 2 2
RESULT 3 100 10 1

Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+20000) + (20+20*20000+5*106) + (5*106+2) + 2
= 100,440,124


... Query Evaluation Example51/154

Strategy #2 for answering query:

  1. TMP1    ←    Join[sid=subj](Subject,Lecture)
  2. TMP2    ←    Join[dept=dname](TMP1,Department)
  3. TMP3    ←    Select[city='Sydney' & sname='Databases'](TMP2)
  4. RESULT    ←    Project[dname](TMP3)


... Query Evaluation Example52/154

Costs involved in using strategy #2:

Reln r R C b
TMP1 2000 195 5 400
TMP2 2000 395 2 1000
TMP3 3 395 2 2
RESULT 3 100 10 1

Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4
= (100+100*200+400) + (20+20*400+1000) + (1000+2) + 2
= 30,524


... Query Evaluation Example53/154

Strategy #3 for answering query:

  1. TMP1    ←    Join[sid=subj](Subject,Lecture)
  2. TMP2    ←    Select[sname='Databases'](TMP1)
  3. TMP3    ←    Join[dept=dname](TMP2,Department)
  4. TMP4    ←    Select[city='Sydney'](TMP3)
  5. RESULT    ←    Project[dname](TMP4)


... Query Evaluation Example54/154

Costs involved in using strategy #3:

Reln r R C b
TMP1 2000 195 5 400
TMP2 20 195 5 4
TMP3 20 395 2 10
TMP4 3 395 2 2
RESULT 3 100 10 1

Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4 + CostStep5
= (100+100*200+400) + (400+4) + (4+4*20+10) + (10+2) + 1
= 21,011


... Query Evaluation Example55/154

Strategy #4 for answering query:

  1. TMP1    ←    Select[sname='Databases'](Subject)
  2. TMP2    ←    Select[city='Sydney'](Department)
  3. TMP3    ←    Join[sid=subj](TMP1,Lecture)
  4. TMP4    ←    Join[dept=dname](TMP3,TMP2)
  5. RESULT    ←    project[dname](TMP4)


... Query Evaluation Example56/154

Costs involved in using strategy #4:

Reln r R C b
TMP1 80 95 5 16
TMP2 5 200 5 1
TMP3 300 195 5 60
TMP4 3 395 2 2
RESULT 3 100 10 1
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4 + CostStep5
= (100+16) + (20+1) + (16+16*200+60) + (1+1*60+2) + 2
= 3478


... Query Evaluation Example57/154

Strategy #5 for answering query:

  1. TMP1    ←    Select[sname='Databases'](Subject)
  2. TMP2    ←    Select[city='Sydney'](Department)
  3. TMP3    ←    Join[dept=dname](TMP2,Lecture)
  4. TMP4    ←    Join[sid=subj](TMP3,TMP1)
  5. RESULT    ←    project[dname](TMP4)


... Query Evaluation Example58/154

Costs involved in using strategy #5:

Reln r R C b
TMP1 80 95 5 16
TMP2 5 200 5 1
TMP3 100 295 3 34
TMP4 3 395 2 2
RESULT 3 100 10 1
Total I/Os
= CostStep1 + CostStep2 + CostStep3 + CostStep4 + CostStep5
= (100+16) + (20+1) + (1+1*200+34) + (16+16*34+2) + 2
= 936


Query Optimisation Problem59/154

Given:

Determine a sequence of relational algebra operations that: The term "query optimisation" is a misnomer:


... Query Optimisation Problem60/154

Why do we not generate optimal query execution plans?

Finding an optimal query plan ...

Even for a small query (e.g. 4-5 joins, 4-5 selects) Compromise:


Cost Models and Analysis61/154

The cost of evaluating a query is determined by:

Analysis of costs involves estimating:


Approaches to Optimisation62/154

Three main classes of techniques developed:

All driven by aim of minimising (or at least reducing) "cost".

Real query optimisers use a combination of algrebraic+physical.

Semantic QO is good idea, but expensive/difficult to implement.


Optimisation Process63/154

Start with RA tree representation of query, then ...

  1. apply algebraic query transformations
    • giving standardised, simpler, more efficient RA tree
  2. generate possible access plans (physical)
    • replace each RA operation by specific access method to implement it
    • consider possible orders of join operations (left-deep trees)
  3. analyse cost of generated plans and select cheapest
Result: tree of DBMS operations to answer query efficiently.


... Optimisation Process64/154

Example of optimisation transformations:

[Diagram:Pics/qproc/qtransform-small.png]

For join, may also consider sort/merge join and hash join.


Algebraic Optimisation65/154

Make use of algebraic equivalences:

Most commonly used heuristic:

Apply Select and Project before Join

Rationale: minimises size of intermediate relations.

Can potentially be done during the SQLRA mapping phase.

Algebraic optimisation cannot assist with finding good join order.


Physical Optimisation66/154

Makes use of execution cost analysis:

Physical optimisation is also called query evaluation plan generation.


Semantic Optimisation67/154

Make use of application-specific properties:

Can be applied in algebraic or physical optimisation phase.

Basis: exploit meta-data and other semantic info about relations.

(E.g. this field is a primary key, so we know there'll only be one matching tuple)


Stages in Algebraic Optimisation68/154

Start with relational algebra (RA) expression.

  1. Standardise
  2. Simplify
  3. Ameliorate
Result: an RA expression equivalent to the input, but more efficient.


Standardisation69/154

Many query optimisers assume RA expression is in a "standard form".

E.g. select conditions may be stated in

RA expression is first transformed into one of these forms.


Simplification70/154

Main aim of simplification is to reduce redundancy.

Makes use of tranformation rules based on laws of boolean algebra.

A AND B B AND A
A OR (B AND C) (A OR B) AND (A OR C)
A AND A A
A OR NOT(A) True
A AND NOT(A) False
A OR (A AND B) A
NOT(A AND B) NOT(A) OR NOT(B)
NOT(NOT(A)) A

(Programmers don't produce redundant expressions; but much SQL is produced by programs)


Simplification Example71/154

Consider the query:

select Title from Employee
where  (not (Title = 'Programmer')
        and (Title = 'Programmer'
             or Title = 'Analyst')
        and not (Title = 'Analyst'))
       or Name = 'Joe'

Denote the atomic conditions as follows:

P = (Title = 'Programmer')
A = (Title = 'Analyst')
J = (Name = 'Joe')


... Simplification Example72/154

Selection condition can then be simplified via:

Cond = ( P  ∧  (P  ∨  A)  ∧  A)  ∨  J
= (( P  ∧  A)  ∧ (P  ∨  A))  ∨  J
= ( (P  ∨  A)  ∧  (P  ∨  A))  ∨  J
= False  ∨  J
= J

Giving the equivalent simplified query:

select Title from Employee where Name = "Joe"


Amelioration73/154

Aim of amelioration is to improve efficiency.

Transform query to equivalent form which is known to be more efficient.

Achieve this via:

Often describe RA expression as a tree and RA transformations as tree transformations.


Amelioration Process74/154

  1. break up σa ∧ b ∧ c ... into ``cascade'' of σ
    (gives more flexibility for later transformations)
  2. move σ as far down as possible
    (reduces size of intermediate results)
  3. move most restrictive σ down/left
    (reduces size of intermediate results)
  4. move π as far down as possible
    (reduces size of intermediate results)
  5. replace × then σ by equivalent
    (reduces computation/size of intermediate results)
  6. replace subexpressions by single operation
    (e.g. opposite of 1.) (reduces computation overhead)


Amelioration Example75/154

Consider school information database:

CSG(Course, StudentId, grade)
SNAP(StudentId, name, address, phone)
CDH(Course, Day, Hour)
CR(Course, Room)

And the query:

Where is Brown at 9am on Monday morning?


... Amelioration Example76/154

An obvious translation of the query into SQL

select Room from CSG,SNAP,CDH,CR
where  name='Brown' and day='Mon' and hour='9am'

and an obvious mapping of the SQL into relational algebra gives

πRoom ( σN&D&H ( CSGSNAPCDHCR))

N  is  name='Brown',   D  is  Day='Mon',   H  is  Hour='9am'


... Amelioration Example77/154

πRoom ( σN&D&H ( CSGSNAPCDHCR))

Push selection:

πRoom ( σN&D&H ( CSGSNAPCDH) ⋈ CR)

Split selection:

πRoom ( σN ( σD&H ( CSGSNAPCDH)) ⋈ CR)

Push two selections:

πRoom ( ( σN (CSGSNAP) ⋈ σD&H (CDH)) ⋈ CR)

Push selection:

πRoom ( ((CSG ⋈ σN (SNAP)) ⋈ σD&H (CDH)) ⋈ CR)

Can we show that the final version is more efficient?


... Amelioration Example78/154

Could use an argument based on size of intermediate results, as above.

But run into problems with expressions like:

CSGSNAPCDHCR

Order of joins can make a significant difference?

How to decide "best" order?

Need understanding of physical characteristics of relations
(for example, selectivity and likelihood of join matches across tables)


Physical (Cost-Based) Optimisation79/154

Need to determine:

From these, estimate overall evaluation cost.

Do this for all possible execution plans.

Choose cheapest plan.


Execution Plans80/154

Consider query execution plans for the RA expression:

σc (R  ⋈d  S  ⋈e  T)

Plan #1

tmp1   :=  HashJoin[d](R,S)
tmp2   :=  SortMergeJoin[e](tmp1,T)
result :=  BinarySearch[c](tmp2)


... Execution Plans81/154

Plan #2

tmp1   := SortMergeJoin[e](S,T)
tmp2   := HashJoin[d](R,tmp1)
result := LinearSearch[c](tmp2)

Plan #3

tmp1   := BtreeSearch[c](R)
tmp2   := HashJoin[d](tmp1,S)
result := SortMergeJoin[e](tmp2)

All plans produce same result, but have quite different costs.


Cost-based Query Optimiser82/154

Overview of cost-based query optimisation process:

For a given SQL query, there are Too many combinations to enumerate all; prune via heuristics.


... Cost-based Query Optimiser83/154

Approximate algorithm for cost-based optimisation:

translate SQL query to RAexp
for all transformations RA' of RAexp {
    for each node e of RA' (recursively) {
        select access method for e
        plan[i++] = access method for e
    }
    cost = 0
    for each op in plan[]
        { cost += Cost(op) }
    if (cost < MinCost) {
        MinCost = cost
        BestPlan = plan
}   }

As noted above, we do not consider *all* transformations.
Heuristics: push selections down, consider only left-deep join trees.


Choosing Access Methods84/154

Inputs:

Output:


... Choosing Access Methods85/154

Example:

giving

tmp[i]   := BtreeSearch[name='John'](Student)
tmp[i+1] := LinearSearch[age>21](tmp[i])

Where possible, use pipelining to avoid storing tmp[i] on disk.


... Choosing Access Methods86/154

Rules for choosing σ access methods:


... Choosing Access Methods87/154

Rules for choosing access methods:

 
(bnl = block nested loop;   inl = index nested loop;   sm = sort merge)


Example Plan Enumeration88/154

Consider the query:

select name
from   Student s, Enrol e
where  s.sid = e.sid AND e.cid = 'COMP9315';

where

Relation r b PrimKey Clustered Indexes
Student 10000 500 sid No B-tree on sid
Enrol 40000 400 cid Yes B-tree on cid
S  E 40000 800 sid,cid No -


... Example Plan Enumeration89/154

First step is to map SQL to RA expression

σ9315 ( Student   ⋈sid   Enrol )

then devise an execution plan based on this expression

giving

tmp1   := inlJoin[s.sid=e.sid](Enrol,Student)
result := LinearSearch[e.id='COMP9315'](tmp1)


... Example Plan Enumeration90/154

Estimated cost for this plan:

Cost = inl-join on (Enrol,Student) + linear scan of tmp1
= Tr(bS + rS.btreeE) + Twbtmp1 + Trbtmp1
= 500 + 10,000.4 + 800 + 800
= 42,100

Notes:


... Example Plan Enumeration91/154

Apply rule for pushing select into join:

σ9315 ( Student   ⋈sid   Enrol )
giving
Student   ⋈sid   σ9315 ( Enrol )

and devise a plan based on

giving

tmp1   := BtreeSearch[e.cid='COMP9315'](Enrol)
result := bnlJoin[s.sid=e.sid](tmp1,Student)


... Example Plan Enumeration92/154

Estimated cost for this plan:

Cost = btree-search on Enrol + bnl-join of (tmp1,Student)
= search+scan on Enrol + Tr(btmp1 + bS)
= 3Tr + 3Tr + Tr(1 + 500)
= 3 + 3 + 1 + 500   =   507

Notes:


Cost Estimation93/154

Without actually executing it, cannot always know the cost of plan precisely.

(E.g. how big is the select result that feeds into the join?)

Thus, query optimisers need to estimate costs.

Two aspects to cost estimation:


Cost Estimation Information94/154

Cost estimation uses statistical information about relations:

rS cardinality of relation S
RS avg size of tuple in relation S
V(A,S) # distinct values of attribute A
min(A,S) min value of attribute A
max(A,S) max value of attribute A


Estimating Projection Result Size95/154

Straightforward, since we know:

#bytes in result = rS × Rout

If pipelining through buffers of size B, then

#buffers-full = rS × B/Rout


Estimating Selection Result Size96/154

Selectivity factor = fraction of tuples expected to satisfy a condition.

Common assumption: attribute values uniformly distributed.

Example: Consider the query

select * from Parts where colour='Red'

and assume that there are only four possible colours.

If we have 1000 Parts, we would expect 250 of them to be Red.

In other words,

V(colour,R)=4,   rR=1000  ⇒  |σcolour=K(R)|=250

In general, A=c(R)|  ≅  rR/V(A,R)


... Estimating Selection Result Size97/154

Estimating size of result for e.g.

select * from Enrolment
where  year > 2003;

Could estimate by using:

Assume: min(year)=1996, max(year)=2005, |Enrolment|=105 Of course, we know that enrolments grow continually, so this underestimates.

Simpler heuristic used by some systems: selected tuples = r/3


... Estimating Selection Result Size98/154

Estimating size of result for e.g.

select * from Enrolment
where  course <> 'COMP9315';

Could estimate by using:

Alternative, simpler way to estimate:


... Estimating Selection Result Size99/154

Uniform distribution assumption is convenient, but often not realistic.

How to handle non-uniform attribute value distributions?

So, for the part colour example, we might have a distribution like:

White: 35%   Red: 30%   Blue: 25%   Silver: 10%

Use histogram as basis for determining # selected tuples.

Disadvantage: cost of storing/maintaining histograms.


Estimating Join Result Size100/154

Analysis is not as simple as select.

Relies on semantic knowledge about data/relations.

Consider equijoin on common attr: R ⋈A S

Case 1: |R ⋈A S| = 0

Useful if we know that dom(A,R) ∩ dom(A,S) = {}

Case 2: A is unique key in both R and S
can't be more tuples in join than in smaller of R and S.

max(|R ⋈A S|)  =  min(|R|, |S|)


... Estimating Join Result Size101/154

Case 3: A is primary key in R, foreign key in S
every S tuple has at most one matching R tuple.

max(|R ⋈A S|)  =  |S|

Example:

select name from Students s, Enrol e
where  e.cid = 'COMP9315' AND e.sid = s.sid

|Students ⋈ σ9315(Enrol)|  =  |σ9315(Enrol)|


Cost Estimation: Postscript102/154

Inaccurate cost estimation can lead to poor evaluation plans.

Above methods can (sometimes) give inaccurate estimates.

To get more accurate estimates costs:

Either way, optimisation process costs more (more than query?)

Trade-off between optimiser performance and query performance.


Semantic Query Optimisation (SQO)103/154

Makes use of semantics of data to assist query optimisation process.

The discussion of join cost estimation above gives an example of this.

Kinds of information used:

SQO uses this to simplify queries and reduce search space.

Two examples of semantic transformation operations:


Semantic Equivalence104/154

Semantic equivalence does not require syntactic equivalence.

Consider the two queries:

select * from Emp where sal > 80K

select * from Emp where sal > 80K and job = 'Prof'

They are equivalent under the semantic rule:

Emp.job = 'Prof'    Emp.sal > 80K

(i.e. Profs are the only people earning more than $80,000)

Could use this to transform query to exploit index on salary.


Restriction Elimination105/154

Query:

List all the departments that store benzene in quantities of more than 400

select dept from  Storage
where material = 'Benzene' and qty > 400

Use rule: material = 'Benzene'    qty > 500

select dept from Storage where material = 'Benzene'

Result: Unnecessary restriction on the attribute qty is eliminated.


Index Introduction106/154

Query: Find all the employees who make more than 42K

select name from Employees where salary > 42K

Use rule: salary > 42K    job = 'manager'

select name from Employees
where salary > 42K and job = 'manager'

Result: A new constraint is obtained on the indexed attribute job.


Query Execution


Query Execution108/154

Query execution:   applies evaluation plan set of result tuples

[Diagram:Pics/qproc/qproc3-small.png]


... Query Execution109/154

Query execution


... Query Execution110/154

Example of query translation:

select s.name, s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and e.semester = '05s2';

maps to

πname,id,course,mark(Stu ⋈e.student=s.idsemester=05s2Enr))

maps to

Temp1  = BtreeSelect[semester=05s2](Enr)
Temp2  = HashJoin[e.student=s.id](Stu,Temp1)
Result = Project[name,id,course,mark](Temp2)


... Query Execution111/154

A query execution plan:

Results may be passed from one operator to the next:


... Query Execution112/154

Two ways of communicating results between query blocks ...

Materialization

Pipelining


Materialization113/154

Steps in materialization between two operators

Advantage: Disadvantage:


... Materialization114/154

Example:

select s.name, s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and
       e.semester = '05s2' and s.name = 'John';

might be executed as

Temp1  = BtreeSelect[semester=05s2](Enrolment)
Temp2  = BtreeSelect[name=John](Student)
         -- indexes on name and semester
	 -- produce sorted Temp1 and Temp2
Temp3  = SortMergeJoin[e.student=s.id](Temp1,Temp2)
         -- SMJoin especially effective, since
         -- Temp1 and Temp2 are already sorted
Result = Project[name,id,course,mark](Temp3)


Pipelining115/154

How pipelining is organised between two operators:

Advantage: Disadvantage:


Iterators (reminder)116/154

Iterators provide a "stream" of results:

Other possible operations: reset to specific point, restart, ...


... Iterators (reminder)117/154

Implementation of single-relation selection iterator:

typedef struct {
    File   inf;  // input file
    Cond   cond; // selection condition
    Buffer buf;  // buffer holding current page
    int    curp; // current page during scan
    int    curr; // index of current record in page
} Iterator;

Iterator structure contains information:


... Iterators (reminder)118/154

Implementation of single-relation selection iterator (cont):

Iterator *open(char *relName, Condition cond) {
    Iterator *iter = malloc(sizeof(Iterator));
    iter->inf  = openFile(fileName(relName),READ);
    iter->cond = cond;
    iter->curp = 0;
    iter->curr = -1;
    readBlock(iter->inf, iter->curp, iter->buf);
    return iter;
}
void close(Iterator *iter) {
    closeFile(iter->inf);
    free(iter);
}


... Iterators (reminder)119/154

Implementation of single-relation selection iterator (cont):

Tuple next(Iterator *iter) {
    Tuple rec;
    do {
        // check if reached end of current page
        if (iter->curr == nRecs(iter->buf)-1) {
            // check if reached end of data file
    	    if (iter->curp == nBlocks(iter->inf)-1)
    	        return NULL;
    	    iter->curp++;
            iter->buf = readBlock(iter->inf, iter->curp);
            iter->curr = -1;
        }
        iter->curr++;
        rec = getRec(iter->buf, iter->curr);
    } while (!matches(rec, iter->cond));
    return rec;
}
// curp and curr hold indexes of most recently read page/record


Pipelining Example120/154

Consider the query:

select s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and
       e.semester = '05s2' and s.name = 'John';

which maps to the RA expression

Proj[id,course,mark](Join[student=id](Sel[05s2](Enr),Sel[John](Stu)))

which could represented by the RA expression tree

[Diagram:Pics/qproc/qtree0-small.png]


... Pipelining Example121/154

Modelled as communication between RA tree nodes:

[Diagram:Pics/qproc/qtree-small.png]


... Pipelining Example122/154

This query might be executed as

System:
    iter0 = open(Result)
    while (Tup = next(iter0)) { display Tup }
    close(iter0)
Result:
    iter1 = open(Join)
    while (T = next(iter1))
        { T' = project(T); return T' }
    close(iter1)
Sel1:
    iter4 = open(Btree(Enrolment,'semester=05s2'))
    while (A = next(iter4)) { return A }
    close(iter4)
...


... Pipelining Example123/154

...
Join: -- nested-loop join
    iter2 = open(Sel1)
    while (R = next(iter2) {
        iter3 = open(Sel2)
        while (S = next(iter3))
            { if (matches(R,S) return (RS) }
        close(iter3) // better to reset(iter3)
    }
    close(iter2)
Sel2:
    iter5 = open(Btree(Student,'name=John'))
    while (B = next(iter5)) { return B }
    close(iter5)


Pipeline Execution124/154

Piplines can be executed as ...

Demand-driven

Producer-driven In both cases, top-level driver is request for result tuples.

In parallel-processing systems, iterators could run concurrently.


Disk Accesses125/154

Pipelining cannot avoid all disk accesses.

Some operations use multiple passes (e.g. merge-sort, hash-join).

Thus ...


... Disk Accesses126/154

Sophisticated query optimisers might realise e.g.

if operation X writes its results to a file with structure S,
the subsequent operation Y will proceed much faster
than if Y reads X's output tuple-at-a-time

In this case, it could materialize X's output in an S-file.

Produces a pipeline/materialization hybrid query execution.

Example:


... Disk Accesses127/154

Example: (pipeline/materialization hybrid)

select s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and
       e.semester = '05s2' and s.name = 'John';

might be executed as

System:
    exec(Sel2)  -- creates Temp1
    iter0 = open(Result)
    while (Tup = next(iter0)) { display Tup }
    close(iter0)
Result:
    iter1 = open(Join)
    while (T = next(iter1))
        { T' = project(T); return T' }
    close(iter1)
...


... Disk Accesses128/154

...
Join: -- index join
    iter2 = open(Sel1)
    while (R = next(iter2) {
        iter3 = open(Btree(Temp1,'id=R.student'))
        while (S = next(iter3)) { return (RS) }
        close(iter3)
    }
    close(iter2)
Sel1:
    iter4 = open(Btree(Enrolment,'semester=05s2'))
    while (A = next(iter4)) { return A }
    close(iter4)
Sel2:
    iter5 = open(Btree(Student,'name=John'))
    createBtree(Temp1,'id')
    while (B = next(iter5)) { insert(B,Temp1) }
    close(iter5)


PostgreSQL Execution129/154

Defs: src/include/executor and src/include/nodes

Code: src/backend/executor

PostgreSQL uses pipelining ...


... PostgreSQL Execution130/154

Modules in src/backend/executor fall into two groups:

execXXX (e.g. execMain, execProcnode, execScan)

nodeXXX   (e.g. nodeSeqscan, nodeNestloop, nodeGroup) The "style" is OO (e.g. specialisations of Nodes), but implementation in C masks this


... PostgreSQL Execution131/154

Top-level data/functions for executor ...

QueryDesc

ExecutorStart(QueryDesc *, ...) ExecutorRun(QueryDesc *, ScanDirection, ...) ExecutorEnd(QueryDesc *)


... PostgreSQL Execution132/154

Overview of query processing:

CreateQueryDesc

ExecutorStart
    CreateExecutorState -- creates per-query context
    switch to per-query context to run ExecInitNode
    ExecInitNode -- recursively scans plan tree
        CreateExprContext -- creates per-tuple context
        ExecInitExpr

ExecutorRun
    ExecutePlan -- invoke iterators from root
        ExecProcNode -- recursively called in per-query context
            ExecEvalExpr -- called in per-tuple context
            ResetExprContext -- to free memory

ExecutorEnd
    ExecEndNode -- recursively releases resources
    FreeExecutorState -- frees per-query and child contexts

FreeQueryDesc


... PostgreSQL Execution133/154

More detailed view of plan execution (but still much simplified)

ExecutePlan(execState, planStateNode, ...) {
    process "before each statement" triggers
    for (;;) {
        tuple = ExecProcNode(planStateNode)
        check tuple validity // MVCC
        if (got a tuple) break
    }
    process "after each statement" triggers
    return tuple
}
ExecProcNode(node) {
    switch (nodeType(node)) {
    case SeqScan:
        result = ExecSeqScan(node); break;
    case NestLoop:
        result = ExecNestLoop(node); break;
    ...
    }
    return result;
}


... PostgreSQL Execution134/154

Generic iterator interface is provided by ...

ExecInitNode

ExecProcNode ExecEndNode Each calls corresponding function for specific node type
(e.g. for nested loop join ExecInitNestLoop(), ExecNestLoop(), ExecEndNestLoop())


Example PostgreSQL Execution135/154

Consider the query:

-- get manager's age and # employees in Shoe department
select e.age, d.nemps
from   Departments d, Employees e
where  e.name = d.manager and d.name ='Shoe'

and its execution plan tree

[Diagram:Pics/qproc/qtree1-small.png]


... Example PostgreSQL Execution136/154

This produces a tree with three nodes:

We ignore the top-level node here (it handles the projection via attrList)


... Example PostgreSQL Execution137/154

Initially InitPlan() invokes ExecInitNode() on plan tree root.


ExecInitNode() sees a NestedLoop node ... 
    so dispatches to ExecInitNestLoop() to set up iterator
    and then invokes ExecInitNode() on left and right sub-plans
        in left subPlan, ExecInitNode() sees an IndexScan node
            so dispatches to ExecInitIndexScan() to set up iterator
        in right sub-plan, ExecInitNode() sees aSeqScan node
            so dispatches to ExecInitSeqScan() to set up iterator

Result: a plan state tree with same structure as plan tree.


... Example PostgreSQL Execution138/154

Execution: ExecutePlan() repeatedly invokes ExecProcNode().


ExecProcNode() sees a NestedLoop node ...
    so dispatches to ExecNestedLoop() to get next tuple
    which invokes ExecProcNode() on its sub-plans
        in the left sub-plan, ExecProcNode() sees an IndexScan node
            so dispatches to ExecIndexScan() to get next tuple
            if no more tuples, return END
            for this tuple, invoke ExecProcNode() on right sub-plan
                ExecProcNode() sees a SeqScan node
                    so dispatches to ExecSeqScan() to get next tuple
                check for match and return joined tuples if found
            reset right sub-plan iterator

Result: stream of result tuples returned via ExecutePlan()


Performance Tuning


Performance Tuning140/154

Schema design:

Performance tuning: Good performance may involve any/all of:


... Performance Tuning141/154

Tuning requires us to consider the following:


... Performance Tuning142/154

Performance can be considered at two times:


Denormalisation143/154

Normalisation minimises storage redundancy.

Problem: queries that need to put data back together. Solution: store some data redundantly


... Denormalisation144/154

Example ... consider the following normalised schema:

create table Subject (
   id        serial primary key,
   code      char(8), -- e.g. COMP9315
   title     varchar(60),
   syllabus  text, ... );
create table Term (
   id        serial primary key,
   name      char(4), -- e.g. 09s2
   starting  date, ... );
create table Course (
   subject   integer references Subject(id),
   term      integer references Term(id),
   lic       integer references Staff(id), ... );


... Denormalisation145/154

Example: Courses = Course Subject Term

If we often need to refer to "standard" name (e.g. COMP9315 09s2)


-- can now replace a query like:
select s.code||' '||t.name, avg(e.mark)
from   Course c, Subject s, Term t
where  c.subject = s.id and c.term = t.id
       and s.code='COMP9315' and t.name='09s2'
-- by a query like:
select c.courseName, e.grade, e.mark
from   Course c
where  c.courseName = 'COMP9315 09s2'


Indexes146/154

Indexes provide efficient content-based access to tuples.

Can build indexes on any (combination of) attributes.

Definining indexes:

CREATE INDEX name ON table ( attr1, attr2, ... )

attri can be an arbitrary expression (e.g. upper(name)).

CREATE INDEX also allows us to specify


... Indexes147/154

Indexes can significantly improve query costs.

Considerations in applying indexes:


Query Tuning148/154

Sometimes, a query can be re-phrased to affect performance:

Examples which may prevent optimiser from using indexes:

select name from Employee where salary/365 > 100
       -- fix by re-phrasing condition to (salary > 36500)
select name from Employee where name like '%ith%'
select name from Employee where birthday is null
       -- above two are difficult to "fix"
select name from Employee
where  dept in (select id from Dept where ...)
       -- fix by using Employee join Dept on (e.dept=d.id)


... Query Tuning149/154

Other factors to consider in query tuning:


PostgreSQL Query Tuning150/154

PostgreSQL provides the explain statement to

Usage:

EXPLAIN [ANALYZE] Query

Without ANALYZE, EXPLAIN shows plan with estimated costs.

With ANALYZE, EXPLAIN executes query and prints real costs.

Note that runtimes may show considerable variation due to buffering.


EXPLAIN Examples151/154

Example: Select on indexed attribute


ass2=# explain select * from Students where id=100250;
                            QUERY PLAN
-----------------------------------------------------------------
 Index Scan using student_pkey on student
                  (cost=0.00..5.94 rows=1 width=17)
   Index Cond: (id = 100250)

ass2=# explain analyze select * from Students where id=100250;
                            QUERY PLAN
-----------------------------------------------------------------
 Index Scan using student_pkey on student
                  (cost=0.00..5.94 rows=1 width=17)
                  (actual time=31.209..31.212 rows=1 loops=1)
   Index Cond: (id = 100250)
 Total runtime: 31.252 ms


... EXPLAIN Examples152/154

Example: Select on non-indexed attribute


ass2=# explain select * from Students where stype='local';
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on student  (cost=0.00..70.33 rows=18 width=17)
   Filter: ((stype)::text = 'local'::text)

ass2=# explain analyze select * from Students
ass2-#                          where stype='local';
                         QUERY PLAN
---------------------------------------------------------------
 Seq Scan on student  (cost=0.00..70.33 rows=18 width=17)
             (actual time=0.061..4.784 rows=2512 loops=1)
   Filter: ((stype)::text = 'local'::text)
 Total runtime: 7.554 ms


... EXPLAIN Examples153/154

Example: Join on a primary key (indexed) attribute


ass2=# explain
ass2-# select s.sid,p.name
ass2-# from Students s, People p where s.id=p.id;
                        QUERY PLAN
-----------------------------------------------------------
 Hash Join  (cost=70.33..305.86 rows=3626 width=52)
   Hash Cond: ("outer".id = "inner".id)
   -> Seq Scan on person p
               (cost=0.00..153.01 rows=3701 width=52)
   -> Hash  (cost=61.26..61.26 rows=3626 width=8)
       -> Seq Scan on student s
                   (cost=0.00..61.26 rows=3626 width=8)


... EXPLAIN Examples154/154

Example: Join on a non-indexed attribute


ass3=> explain select s1.code, s2.code
ass2-# from Subjects s1, Subjects s2 where s1.offerer=s2.offerer;
                        QUERY PLAN
----------------------------------------------------------------
 Merge Join  (cost=2744.12..18397.14 rows=1100342 width=18)
   Merge Cond: (s1.offerer = s2.offerer)
   ->  Sort  (cost=1372.06..1398.33 rows=10509 width=13)
         Sort Key: s1.offerer
         ->  Seq Scan on subjects s1
                      (cost=0.00..670.09 rows=10509 width=13)
   ->  Sort  (cost=1372.06..1398.33 rows=10509 width=13)
         Sort Key: s2.offerer
         ->  Seq Scan on subjects s2
                      (cost=0.00..670.09 rows=10509 width=13)
(8 rows)


Produced: 28 Jul 2019