G: Query Translation, Optimisation, Execution

>>

Query Evaluation
Terminology Variations
Query Translation
Query Translation
Parsing SQL
SQL → RA
Expression Rewriting Rules
Relational Algebra Laws
Query Optimisation
Approaches to Optimisation
Query Optimization
Cost Models and Analysis
Choosing Access Methods (RelOps)
Cost Estimation
Estimating Projection Result Size
Estimating Selection Result Size
Selection Size Estimation
Estimating Join Result Size
Cost Estimation: Postscript
Overview of PostgreSQL QOpt Process
QOpt Data Structures
Query Optimisation Process
Top-down Trace of QOpt
Join-tree Generation
Query Execution
Materialization
Pipelining
Iterators (reminder)
Pipelining Example
Disk Accesses
PostgreSQL Query Execution
PostgreSQL Executor
Example PostgreSQL Execution
Performance Tuning
PostgreSQL Query Tuning
EXPLAIN Examples
Using EXPLAIN

∧ >>

❖ Query Evaluation

<< ∧ >>

❖ Query Evaluation (cont)

A query in SQL:

states what kind of answers are required (declarative)
does not say how they should be computed (procedural)

A query evaluator/processor :

takes declarative description of query (in SQL)
parses query to internal representation (relational algebra)
determines plan for answering query (expressed as DBMS ops)
executes method via DBMS engine (to produce result tuples)

Some DBMSs can save query plans for later re-use.

<< ∧ >>

❖ Query Evaluation (cont)

Internals of the query evaluation "black-box":

<< ∧ >>

❖ Query Evaluation (cont)

DBMSs provide several "flavours" of each RA operation.

For example:

several "versions" of selection (σ) are available

each version is effective for a particular kind of selection, e.g

select * from R where id = 100  -- hashing
select * from S                 -- Btree index
where age > 18 and age < 35
select * from T                 -- MALH file
where a = 1 and b = 'a' and c = 1.4

Similarly, π and ⋈ have versions to match specific query types.

<< ∧ >>

❖ Query Evaluation (cont)

We call these specialised version of RA operations RelOps.

One major task of the query processor:

given a RA expression to be evaluated
find a combination of RelOps to do this efficiently

Requires the query translator/optimiser to consider

information about relations (e.g. sizes, primary keys, ...)
information about operations (e.g. selection reduces size)

RelOps are realised at execution time

as a collection of inter-communicating nodes
communicating either via pipelines or temporary relations

<< ∧ >>

❖ Terminology Variations

Relational algebra expression of SQL query

intermediate query representation
logical query plan

Execution plan as collection of RelOps

query evaluation plan
query execution plan
physical query plan

Representation of RA operators and expressions

σ = Select = Sel, π = Project = Proj
R ⋈ S = R Join S = Join(R,S), ∧ = &, ∨ = |

<< ∧ >>

❖ Query Translation

Query translation: SQL statement text → RA expression

<< ∧ >>

❖ Query Translation

Translation step: SQL text → RA expression

Example:

SQL: select name from Students where id=7654321;
-- is translated to
RA:  Proj[name](Sel[id=7654321]Students)

Processes: lexer/parser, mapping rules, rewriting rules.

Mapping from SQL to RA may include some optimisations, e.g.

select * from Students where id = 54321 and age > 50;
-- is translated to
Sel[age>50](Sel[id=54321]Students)
-- rather than ... because of index on id
Sel[id=54321&age>50](Students)

<< ∧ >>

❖ Parsing SQL

Parsing task is similar to that for programming languages.

Language elements:

keywords: create, select, from, where, ...
identifiers: Students, name, id, CourseCode, ...
operators: +, -, =, <, >, AND, OR, NOT, IN, ...
constants: 'abc', 123, 3.1, '01-jan-1970', ...

PostgreSQL parser ...

implemented via lex/yacc (src/backend/parser)
maps all identifiers to lower-case (A-Z → a-z)
needs to handle user-extendable operator set
makes extensive use of catalog (src/backend/catalog)

<< ∧ >>

❖ SQL → RA

Remaining steps in processing an SQL statement

parse, map to relation algebra (RA) expression
transform to more efficient RA expression
instantiate RA operators to DBMS operations
execute DBMS operations (aka query plan)

Cost-based optimisation:

generate possible query plans (via rewriting/heuristics)
estimate cost of each plan (using costs of operations)
choose the lowest-cost plan (... and choose quickly)

<< ∧ >>

❖ Expression Rewriting Rules

Since RA is a well-defined formal system

there exist many algebraic laws on RA expressions
which can be used as a basis for expression rewriting
in order to produce equivalent (more-efficient?) expressions

Expression transformation based on such rules can be used

to simplify/improve SQL→RA mapping results
to generate new plan variations to check in query optimisation

<< ∧ >>

❖ Relational Algebra Laws

Commutative and Associative Laws:

R ⋈ S ↔ S ⋈ R, (R ⋈ S) ⋈ T ↔ R ⋈ (S ⋈ T) (natural join)
R ∪ S ↔ S ∪ R, (R ∪ S) ∪ T ↔ R ∪ (S ∪ T)
R ⋈_Cond S ↔ S ⋈_Cond R (theta join)
σ_c ( σ_d (R)) ↔ σ_d ( σ_c (R))

Selection splitting (where c and d are conditions):

σ_c∧d(R) ↔ σ_c ( σ_d (R))
σ_c∨d(R) ↔ σ_c(R) ∪ σ_d(R)

<< ∧ >>

❖ Relational Algebra Laws (cont)

Selection pushing ( σ_c(R ∪ S) and σ_c(R ∪ S) ):

σ_c(R ∪ S) ↔ σ_cR ∪ σ_cS, σ_c(R ∩ S) ↔ σ_cR ∩ σ_cS

Selection pushing with join ...

σ_c (R ⋈ S) ↔ σ_c(R) ⋈ S (if c refers only to attributes from R )
σ_c (R ⋈ S) ↔ R ⋈ σ_c(S) (if c refers only to attributes from S )

If condition contains attributes from both R and S:

σ_c′∧c″ (R ⋈ S) ↔ σ_c′(R) ⋈ σ_c″(S)
c′ contains only R attributes, c″ contains only S attributes

<< ∧ >>

❖ Relational Algebra Laws (cont)

Rewrite rules for projection ...

All but last projection can be ignored:

π_L1 ( π_L2 ( ... π_Ln (R))) → π_L1 (R)

Projections can be pushed into joins:

π_L (R ⋈_c S) ↔ π_L ( π_M(R) ⋈_c π_N(S) )

where

M and N must contain all attributes needed for c
M and N must contain all attributes used in L (L ⊂ M∪N)

<< ∧ >>

❖ Relational Algebra Laws (cont)

Subqueries ⇒ convert to a join

Example: (on schema Courses(id,code,...), Enrolments(cid,sid,...), Students(id,name,...)

select c.code, count(*)
from   Courses c
where  c.id in (select cid from Enrolments)
group  by c.code

becomes

select c.code, count(*)
from   Courses c join Enrolments e on c.id = e.cid
group  by c.code

<< ∧ >>

❖ Relational Algebra Laws (cont)

But not all subqueries can be converted to join, e.g.

select e.sid as student_id, e.cid as course_id
from   Enrolments e
where  e.sid = (select max(id) from Students)

has to be evaluated as

Val = max[id]Students

Res = π_(sid,cid)(σ_sid=ValEnrolments)

<< ∧ >>

❖ Query Optimisation

Query optimiser: RA expression → efficient evaluation plan

<< ∧ >>

❖ Query Optimisation (cont)

Query optimisation is a critical step in query evaluation.

The query optimiser

takes relational algebra expression from SQL compiler
produces sequence of RelOps to evaluate the expression
query execution plan should provide efficient evaluation

"Optimisation" is a misnomer since query optimisers

aim to find a good plan ... but maybe not optimal

Observed Query Time = Planning time + Evaluation time

<< ∧ >>

❖ Query Optimisation (cont)

Why do we not generate optimal query execution plans?

Finding an optimal query plan ...

requires exhaustive search of a space of possible plans
for each possible plan, need to estimate cost (not cheap)

Even for relatively small query, search space is very large.

Compromise:

do limited search of query plan space (guided by heuristics)
quickly choose a reasonably efficient execution plan

<< ∧ >>

❖ Approaches to Optimisation

Three main classes of techniques developed:

algebraic (equivalences, rewriting, heuristics)
physical (execution costs, search-based)
semantic (application properties, heuristics)

All driven by aim of minimising (or at least reducing) "cost".

Real query optimisers use a combination of algrebraic+physical.

Semantic QO is good idea, but expensive/difficult to implement.

<< ∧ >>

❖ Query Optimization

Input: relational algebra expression tree

Output: tree of instantiated relation operations

[Diagram:Pics/qproc/query-transform.png]

<< ∧ >>

❖ Query Optimization (cont)

Where query optimisation fits in query evaluation:

<< ∧ >>

❖ Query Optimization (cont)

Approximate algorithm for cost-based optimisation:

translate SQL query to RAexp
for enough transformations RA' of RAexp {
   while (more choices for RelOps in RA') {
      Plan = {}
      for each node N of RA' (recursively) {
         ROp = select RelOp method for N
         Plan = Plan ∪ ROp
      }
      cost = 0
      for each node P of Plan (bottom-up) 
         { cost += Cost(P) // using child info }
      if (cost < MinCost)
         { MinCost = cost;  BestPlan = Plan }
   }
}

<< ∧ >>

❖ Cost Models and Analysis

The cost of evaluating a query is determined by:

size of relations (database relations and temporary relations)
access mechanisms (indexing, hashing, sorting, join algorithms)
size/number of main memory buffers (and replacement strategy)

Analysis of costs involves estimating:

size of intermediate results
number of secondary storage accesses

<< ∧ >>

❖ Choosing Access Methods (RelOps)

Performed for each node in RA expression tree ...

Inputs:

a single RA operation (σ, π, ⋈)
information about file organisation, data distribution, ...
list of operations available in the database engine

Output:

specific DBMS operation to implement this RA operation

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Example:

RA operation: Sel_{[name='John' ∧ age>21]}(Student)
Student relation has B-tree index on name
database engine (obviously) has B-tree search method

giving

tmp[i]   := BtreeSearch[name='John'](Student)
tmp[i+1] := LinearSearch[age>21](tmp[i])

Where possible, use pipelining to avoid storing tmp[i] on disk.

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing σ access methods:

σ_A=c(R) and R has index on A ⇒ indexSearch[A=c](R)
σ_A=c(R) and R is hashed on A ⇒ hashSearch[A=c](R)
σ_A=c(R) and R is sorted on A ⇒ binarySearch[A=c](R)
σ_{A ≥ c}(R) and R has clustered index on A
⇒ indexSearch[A=c](R) then scan
σ_{A ≥ c}(R) and R is hashed on A
⇒ linearSearch[A>=c](R)

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing ⋈ access methods:

R ⋈ S and R fits in memory buffers ⇒ bnlJoin(R,S)
R ⋈ S and S fits in memory buffers ⇒ bnlJoin(S,R)
R ⋈ S and R,S sorted on join attr ⇒ smJoin(R,S)
R ⋈ S and R has index on join attr ⇒ inlJoin(S,R)
R ⋈ S and no indexes, no sorting ⇒ hashJoin(R,S)

(bnl = block nested loop; inl = index nested loop; sm = sort merge)

<< ∧ >>

❖ Cost Estimation

Without executing a plan, cannot always know its precise cost.

Thus, query optimisers estimate costs via:

cost of performing operation (dealt with in earlier lectures)
size of result (which affects cost of performing next operation)

Result size estimated by statistical measures on relations, e.g.

r_S	cardinality of relation S
R_S	avg size of tuple in relation S
V(A,S)	# distinct values of attribute A
min(A,S)	min value of attribute A
max(A,S)	max value of attribute A

<< ∧ >>

❖ Estimating Projection Result Size

Straightforward, since we know:

number of tuples in output
r_out = | π_a,b,..(T) | = | T | = r_T (in SQL, because of bag semantics)
size of tuples in output
R_out = sizeof(a) + sizeof(b) + ... + tuple-overhead

Assume page size B, b_out = ceil(r_T / c_out), where c_out = floor(B/R_out)

If using select distinct ...

| π_a,b,..(T) | depends on proportion of duplicates produced

<< ∧ >>

❖ Estimating Selection Result Size

Selectivity = fraction of tuples expected to satisfy a condition.

Common assumption: attribute values uniformly distributed.

Example: Consider the query

select * from Parts where colour='Red'

If V(colour,Parts)=4, r=1000 ⇒ |σ_colour=red(Parts)|=250

In general, | σ_A=c(R) | ≅ r_R / V(A,R)

Heuristic used by PostgreSQL: | σ_A=c(R) | ≅ r/10 (unless primary key)

<< ∧ >>

❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where year > 2015;

Could estimate by using:

uniform distribution assumption, r, min/max years

Assume: min(year)=2010, max(year)=2019, |Enrolment|=10⁵

10⁵ from 2010-2019 means approx 10000 enrolments/year
this suggests 40000 enrolments since 2016

Heuristic used by some systems: | σ_A>c(R) | ≅ r/3

<< ∧ >>

❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where course <> 'COMP9315';

Could estimate by using:

uniform distribution assumption, r, domain size

e.g. | V(course,Enrolment) | = 2000, | σ_A<>c(E) | = r * 1999/2000

Heuristic used by some systems: | σ_A<>c(R) | ≅ r

<< ∧ >>

❖ Selection Size Estimation

How to handle non-uniform attribute value distributions?

collect statistics about the values stored in the attribute/relation
store these as e.g. a histogram in the meta-data for the relation

So, for part colour example, might have distribution like:

White: 35% Red: 30% Blue: 25% Silver: 10%

Use histogram as basis for determining # selected tuples.

Disadvantage: cost of storing/maintaining histograms.

<< ∧ >>

❖ Selection Size Estimation (cont)

Summary: analysis relies on operation and data distribution:

E.g. select * from R where a = k;

Case 1: uniq(R.a) ⇒ 0 or 1 result

Case 2: r_R tuples && size(dom(R.a)) = n ⇒ r_R / n results

E.g. select * from R where a < k;

Case 1: k ≤ min(R.a) ⇒ 0 results

Case 2: k > max(R.a) ⇒ ≅ r_R results

Case 3: size(dom(R.a)) = n ⇒ ? min(R.a) ... k ... max(R.a) ?

<< ∧ >>

❖ Estimating Join Result Size

Analysis relies on semantic knowledge about data/relations.

Consider equijoin on common attr: R ⋈_a S

Case 1: values(R.a) ∩ values(S.a) = {} ⇒ size(R ⋈_a S) = 0

Case 2: uniq(R.a) and uniq(S.a) ⇒ size(R ⋈_a S) ≤ min(|R|, |S|)

Case 3: pkey(R.a) and fkey(S.a) ⇒ size(R ⋈_a S) ≤ |S|

<< ∧ >>

❖ Cost Estimation: Postscript

Inaccurate cost estimation can lead to poor evaluation plans.

Above methods can (sometimes) give inaccurate estimates.

To get more accurate cost estimates:

more time ... complex computation of selectivity
more space ... storage for histograms of data values

Either way, optimisation process costs more (more than query?)

Trade-off between optimiser performance and query performance.

<< ∧ >>

❖ Overview of PostgreSQL QOpt Process

Input: tree of Query nodes returned by parser

Output: tree of Plan nodes used by query executor

wrapped in a PlannedStmt node containing state info

Intermediate data structures are trees of Path nodes

a path tree represents one evaluation order for a query

All Node types are defined in include/nodes/*.h

<< ∧ >>

❖ Overview of PostgreSQL QOpt Process (cont)

<< ∧ >>

❖ QOpt Data Structures

Generic Path node structure:


typedef struct Path
{
   NodeTag     type;          // scan/join/...
   NodeTag     pathtype;      // specific method
   RelOptInfo *parent;        // output relation
   PathTarget *pathtarget;    // list of Vars/Exprs, width 
   // estimated execution costs for path ...
   Cost        startup_cost;  // setup cost
   Cost        total_cost;    // total cost
   List       *pathkeys;      // sort order
} Path;

PathKey = (opfamily:Oid, strategy:SortDir, nulls_first:bool)

<< ∧ >>

❖ QOpt Data Structures (cont)

Specialised Path nodes (simplified):


typedef struct IndexPath
{
   Path    path;
   IndexOptInfo *indexinfo; // index for scan 
   List   *indexclauses; // index select conditions 
   ...
   ScanDirection  indexscandir; // used by planner 
   Selectivity  indexselectivity; // estimated #results 
} IndexPath;

typedef struct JoinPath
{
   Path      path;
   JoinType  jointype;     // inner/outer/semi/anti 
   Path     *outerpath;    // outer part of the join 
   Path     *innerpath;    // inner part of the join 
   List     *restrictinfo; // join condition(s) 
} JoinPath;

<< ∧ >>

❖ Query Optimisation Process

Query optimisation proceeds in two stages (after parsing)...

Rewriting:

uses PostgreSQL's rule system to rearrange RA expressions
query tree is expanded to include e.g. view definitions

Planning and optimisation:

using cost-based analysis of generated paths
via one of two different path generators
chooses least-cost path from all those considered

Then produces a Plan tree from the selected path.

<< ∧ >>

❖ Top-down Trace of QOpt

Top-level of query execution: backend/tcop/postgres.c


exec_simple_query(const char *query_string)
{
  // lots of setting up ... including starting xact
  parsetree_list = pg_parse_query(query_string);
  foreach(parsetree, parsetree_list) {
    // Query optimisation
    querytree_list = pg_analyze_and_rewrite(parsetree,...);
    plantree_list = pg_plan_queries(querytree_list,...);
    // Query execution
    portal = CreatePortal(...plantree_list...);
    PortalRun(portal,...);
  }
  // lots of cleaning up ... including close xact
}

Assumes that we are dealing with multiple queries (i.e. SQL statements)

<< ∧ >>

❖ Top-down Trace of QOpt (cont)

query_planner() produces plan for a select/join tree

make list of tables used in query
split where qualifiers ("quals") into
- restrictions (e.g. r.a=1) ... for selections
- joins (e.g. s.id=r.s) ... for joins
search for quals to enable merge/hash joins

invoke make_one_rel() to find best path/plan

Code in: backend/optimizer/plan/planmain.c

<< ∧ >>

❖ Top-down Trace of QOpt (cont)

make_one_rel() generates possible plans, selects best

generate scan and index paths for base tables
- using restrictions list generated above
generate access paths for the entire join tree
- recursive process, controlled by make_rel_from_joinlist()
returns a single "relation", representing result set

Code in: backend/optimizer/path/allpaths.c

<< ∧ >>

❖ Join-tree Generation

make_rel_from_joinlist() arranges path generation

switches between two possible path tree generators
path tree generators finally return best cost path

Standard path tree generator (standard_join_search()):

"exhaustively" generates join trees (like System R)
starts with 2-way joins, finds best combination
then adds extra table to give 3-table join, etc.

Code in: backend/optimizer/path/{allpaths.c,joinrels.c}

<< ∧ >>

❖ Join-tree Generation (cont)

Genetic query optimiser (geqo):

uses genetic algorithm (GA) to generate path trees
handles joins involving > geqo_threshold relations
goals of this approach:
- find near-optimal solution
- examine far less than entire search space

Code in: backend/optimizer/geqo/*.c

<< ∧ >>

❖ Join-tree Generation (cont)

Basic idea of genetic algorithm:

Optimize(join)
{
   t = 0
   p = initialState(join)  // pool of (random) join orders
   for (t = 0; t < #generations; t++) {
      p' = recombination(p) // get parts of good join orders 
      p'' = mutation(p')    // generate new variations 
      p = selection(p'',p)  // choose best join orders 
   }
}

#generations determined by size of initial pool of join orders

<< ∧ >>

❖ Query Execution

Query execution: applies evaluation plan → result tuples

<< ∧ >>

❖ Query Execution (cont)

Example of query translation:

select s.name, s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and e.semester = '05s2';

maps to

π_{name,id,course,mark}(Stu ⋈_{e.student=s.id} (σ_{semester=05s2}Enr))

maps to

Temp1  = BtreeSelect[semester=05s2](Enr)
Temp2  = HashJoin[e.student=s.id](Stu,Temp1)
Result = Project[name,id,course,mark](Temp2)

<< ∧ >>

❖ Query Execution (cont)

A query execution plan:

consists of a collection of RelOps
executing together to produce a set of result tuples

Results may be passed from one operator to the next:

materialization ... writing results to disk and reading them back
pipelining ... generating and passing via memory buffers

<< ∧ >>

❖ Materialization

Steps in materialization between two operators

first operator reads input(s) and writes results to disk
next operator treats tuples on disk as its input
in essence, the Temp tables are produced as real tables

Advantage:

intermediate results can be placed in a file structure
(which can be chosen to speed up execution of subsequent operators)

Disadvantage:

requires disk space/writes for intermediate results
requires disk access to read intermediate results

<< ∧ >>

❖ Pipelining

How pipelining is organised between two operators:

operators execute "concurrently" as producer/consumer pairs
structured as interacting iterators (open; while(next); close)

Advantage:

no requirement for disk access (results passed via memory buffers)

Disadvantage:

higher-level operators access inputs via linear scan, or
requires sufficient memory buffers to hold all outputs

<< ∧ >>

❖ Iterators (reminder)

Iterators provide a "stream" of results:

iter = startScan(params)
- set up data structures for iterator (create state, open files, ...)
- params are specific to operator (e.g. reln, condition, #buffers, ...)
tuple = nextTuple(iter)
- get the next tuple in the iteration; return null if no more
endScan(iter)
- clean up data structures for iterator

Other possible operations: reset to specific point, restart, ...

<< ∧ >>

❖ Pipelining Example

Consider the query:

select s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and
       e.semester = '05s2' and s.name = 'John';

which maps to the RA expression

Proj_{[id,course,mark]}(Join_[student=id](Sel_[05s2](Enr),Sel_[John](Stu)))

<< ∧ >>

❖ Pipelining Example (cont)

Evaluated via communication between RA tree nodes:

Note: likely that projection is combined with join in PostgreSQL

<< ∧ >>

❖ Disk Accesses

Pipelining cannot avoid all disk accesses.

Some operations use multiple passes (e.g. merge-sort, hash-join).

data is written by one pass, read by subsequent passes

Thus ...

within an operation, disk reads/writes are possible
between operations, no disk reads/writes are needed

<< ∧ >>

❖ PostgreSQL Query Execution

Defs: src/include/executor and src/include/nodes

Code: src/backend/executor

PostgreSQL uses pipelining ...

query plan is a tree of Plan nodes
each type of node implements one kind of RA operation
(node implements specific access method via iterator interface)
node types e.g. Scan, Group, Indexscan, Sort, HashJoin
execution is managed via a tree of PlanState nodes
(mirrors the structure of the tree of Plan nodes; holds execution state)

<< ∧ >>

❖ PostgreSQL Executor

Modules in src/backend/executor fall into two groups:

execXXX (e.g. execMain, execProcnode, execScan)

implement generic control of plan evaluation (execution)
provide overall plan execution and dispatch to node iterators

nodeXXX (e.g. nodeSeqscan, nodeNestloop, nodeGroup)

implement iterators for specific types of RA operators
typically contains ExecInitXXX, ExecXXX, ExecEndXXX

<< ∧ >>

❖ Example PostgreSQL Execution

Consider the query:

-- get manager's age and # employees in Shoe department
select e.age, d.nemps
from   Departments d, Employees e
where  e.name = d.manager and d.name ='Shoe'

and its execution plan tree

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

The execution plan tree

contains three nodes:

NestedLoop with join condition (Outer.manager = Inner.name)
IndexScan on Departments with selection (name = 'Shoe')
SeqScan on Employees

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

Initially InitPlan() invokes ExecInitNode() on plan tree root.

ExecInitNode() sees a NestedLoop node ...
   so dispatches to ExecInitNestLoop() to set up iterator
   then invokes ExecInitNode() on left and right sub-plans
       in left subPlan, ExecInitNode() sees an IndexScan node
        so dispatches to ExecInitIndexScan() to set up iterator
       in right sub-plan, ExecInitNode() sees a SeqScan node
        so dispatches to ExecInitSeqScan() to set up iterator

Result: a plan state tree with same structure as plan tree.

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

Then ExecutePlan() repeatedly invokes ExecProcNode().

ExecProcNode() sees a NestedLoop node ...
   so dispatches to ExecNestedLoop() to get next tuple
   which invokes ExecProcNode() on its sub-plans
       in left sub-plan, ExecProcNode() sees an IndexScan node
            so dispatches to ExecIndexScan() to get next tuple
            if no more tuples, return END
            for this tuple, invoke ExecProcNode() on right sub-plan
                ExecProcNode() sees a SeqScan node
                    so dispatches to ExecSeqScan() to get next tuple
                    check for match and return joined tuples if found
                    continue scan until end
                reset right sub-plan iterator

Result: stream of result tuples returned via ExecutePlan()

<< ∧ >>

❖ Performance Tuning

How to make a database-backed system perform "better"?

Improving performance may involve any/all of:

making applications using the DB run faster
lowering response time of queries/transactions
improving overall transaction throughput

Remembering that, to some extent ...

the query optimiser removes choices from DB developers
by making its own decision on the optimal execution plan

<< ∧ >>

❖ Performance Tuning (cont)

Tuning requires us to consider the following:

which queries and transactions will be used?
(e.g. check balance for payment, display recent transaction history)
how frequently does each query/transaction occur?
(e.g. 80% withdrawals; 1% deposits; 19% balance check)
are there time constraints on queries/transactions?
(e.g. EFTPOS payments must be approved within 7 seconds)
are there uniqueness constraints on any attributes?
(define indexes on attributes to speed up insertion uniqueness check)
how frequently do updates occur?
(indexes slow down updates, because must update table and index)

<< ∧ >>

❖ Performance Tuning (cont)

Performance can be considered at two times:

during schema design
- typically towards the end of schema design process
- requires schema transformations such as denormalisation
outside schema design
- typically after application has been deployed/used
- requires adding/modifying data structures such as indexes

Difficult to predict what query optimiser will do, so ...

implement queries using methods which should be efficient
observe execution behaviour and modify query accordingly

<< ∧ >>

❖ PostgreSQL Query Tuning

PostgreSQL provides the explain statement to

give a representation of the query execution plan
with information that may help to tune query performance

Usage:

EXPLAIN [ANALYZE] Query

Without ANALYZE, EXPLAIN shows plan with estimated costs.

With ANALYZE, EXPLAIN executes query and prints real costs.

Note that runtimes may show considerable variation due to buffering.

<< ∧ >>

❖ EXPLAIN Examples

Database


people(id, family, given, title, name, ..., birthday)
courses(id, subject, semester, homepage) 
course_enrolments(student, course, mark, grade, ...) 
subjects(id, code, name, longname, uoc, offeredby, ...)
...

where


       table_name          | n_records 
---------------------------+-----------
 people                    |     55767
 courses                   |     73220
 course_enrolments         |    525688
 subjects                  |     18525
...

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute


uni=# explain
uni=# select * from Students where stype='local';
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
   Filter: ((stype)::text = 'local'::text)

where

Seq Scan = operation (plan node)
cost=StartUpCost..TotalCost
rows=NumberOfResultTuples
width=SizeOfTuple (# bytes)

<< ∧ >>

❖ EXPLAIN Examples (cont)

More notes on explain output:

each major entry corresponds to a plan node
- e.g. Seq Scan, Index Scan, Hash Join, Merge Join, ...
some nodes include additional qualifying information
- e.g. Filter, Index Cond, Hash Cond, Buckets, ...
cost values in explain are estimates (notional units)
explain analyze also includes actual time costs (ms)
costs of parent nodes include costs of all children
estimates of #results based on sample of data

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute with actual costs


uni=# explain analyze
uni=# select * from Students where stype='local';
                       QUERY PLAN
----------------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
             (actual time=0.011..4.704 rows=23551 loops=1)
   Filter: ((stype)::text = 'local'::text)
   Rows Removed by Filter: 7810
 Planning time: 0.054 ms
 Execution time: 5.875 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on indexed, unique attribute


uni=# explain analyze
uni-# select * from Students where id=100250;
                       QUERY PLAN
-------------------------------------------------------
 Index Scan using student_pkey on student
            (cost=0.00..8.27 rows=1 width=9)
            (actual time=0.049..0.049 rows=0 loops=1)
   Index Cond: (id = 100250)
 Planning Time: 0.088 ms
 Execution Time: 0.057 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on indexed, unique attribute


uni=# explain analyze
uni-# select * from Students where id=1216988;
                       QUERY PLAN
-------------------------------------------------------
 Index Scan using students_pkey on students
                  (cost=0.29..8.30 rows=1 width=9)
                  (actual time=0.011..0.012 rows=1 loops=1)
   Index Cond: (id = 1216988)
 Planning time: 0.066 ms
 Execution time: 0.062 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Join on a primary key (indexed) attribute (2016)


uni=# explain analyze
uni-# select s.id,p.name
uni-# from Students s, People p where s.id=p.id;
                      QUERY PLAN
----------------------------------------------------------
Hash Join (cost=988.58..3112.76 rows=31048 width=19)
          (actual time=11.504..39.478 rows=31048 loops=1)
  Hash Cond: (p.id = s.id)
  -> Seq Scan on people p
         (cost=0.00..989.97 rows=36497 width=19)
         (actual time=0.016..8.312 rows=36497 loops=1)
  -> Hash (cost=478.48..478.48 rows=31048 width=4)
          (actual time=10.532..10.532 rows=31048 loops=1)
          Buckets: 4096  Batches: 2  Memory Usage: 548kB
      ->  Seq Scan on students s 
              (cost=0.00..478.48 rows=31048 width=4)
              (actual time=0.005..4.630 rows=31048 loops=1)
Total runtime: 41.0 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Join on a primary key (indexed) attribute (2018)


uni=# explain analyze
uni-# select s.id,p.name
uni-# from Students s, People p where s.id=p.id;
                      QUERY PLAN
----------------------------------------------------------
Merge Join  (cost=0.58..2829.25 rows=31361 width=18)
            (actual time=0.044..25.883 rows=31361 loops=1)
  Merge Cond: (s.id = p.id)
  ->  Index Only Scan using students_pkey on students s
            (cost=0.29..995.70 rows=31361 width=4)
            (actual time=0.033..6.195 rows=31361 loops=1)
        Heap Fetches: 31361
  ->  Index Scan using people_pkey on people p
            (cost=0.29..2434.49 rows=55767 width=18)
            (actual time=0.006..6.662 rows=31361 loops=1)
Planning time: 0.259 ms
Execution time: 27.327 ms

<< ∧

❖ Using EXPLAIN

For more information on reading plans from EXPLAIN

PostgreSQL documentation section 14.1

Can get EXPLAIN output in different formats:

FORMAT { TEXT | XML | JSON | YAML }
details in the PostgreSQL documentation EXPLAIN entry

General PostgreSQL performance tuning

PostgreSQL documentation chapter 14