COMP9315 Week 08 Thursday Lecture

>>

Things To Note
Query Optimization
Cost Models and Analysis
Choosing Access Methods (RelOps)
Cost Estimation
Estimating Projection Result Size
Estimating Selection Result Size
Exercise: Selection Size Estimation
Selection Size Estimation
Exercise: Selection Size Estimation (ii)
Selection Size Estimation
Estimating Join Result Size
Exercise: Join Size Estimation
Cost Estimation: Postscript
PostgreSQL Query Optimiser
Overview of QOpt Process
QOpt Data Structures
Query Optimisation Process
Top-down Trace of QOpt
Join-tree Generation
Query Execution
Query Execution
Materialization
Pipelining
Iterators (reminder)
Pipelining Example
Disk Accesses
PostgreSQL Query Execution
PostgreSQL Query Execution
PostgreSQL Executor
Example PostgreSQL Execution
Query Performance
Performance Tuning
PostgreSQL Query Tuning
EXPLAIN Examples

∧ >>

❖ Things To Note

Assignment 1
- automarking done ... handmarking in progress ... slowly ...
Assignment 2
- due at start week 10 (11:59pm Monday 15 April)
Quiz 4
- due before Friday (April 5) at midnight
Exam
- Thu 9 May ... in CSE labs, closed environment, invigilated
- two 3-hour sessions ... morning + afternoon ... no overlap
- will collect morning/afternoon preferences in week 10

<< ∧ >>

❖ Query Optimization

Input: relational algebra expression tree

Output: tree of instantiated relation operations

[Diagram:Pics/qproc/query-transform.png]

<< ∧ >>

❖ Query Optimization (cont)

Where query optimisation fits in query evaluation:

<< ∧ >>

❖ Query Optimization (cont)

Approximate algorithm for cost-based optimisation:

translate SQL query to RAexp
for enough transformations RA' of RAexp {
   while (more choices for RelOps in RA') {
      Plan = {}
      for each node N of RA' (recursively) {
         ROp = select RelOp method for N
         Plan = Plan ∪ ROp
      }
      cost = 0
      for each node P of Plan (bottom-up) 
         { cost += Cost(P) // using child info }
      if (cost < MinCost)
         { MinCost = cost;  BestPlan = Plan }
   }
}

<< ∧ >>

❖ Cost Models and Analysis

The cost of evaluating a query is determined by:

size of relations (database relations and temporary relations)
access mechanisms (indexing, hashing, sorting, join algorithms)
size/number of main memory buffers (and replacement strategy)

Analysis of costs involves estimating:

size of intermediate results
number of secondary storage accesses

<< ∧ >>

❖ Choosing Access Methods (RelOps)

Performed for each node in RA expression tree ...

Inputs:

a single RA operation (σ, π, ⋈)
information about file organisation, data distribution, ...
list of operations available in the database engine

Output:

specific DBMS operation to implement this RA operation

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Example:

RA operation: Sel_{[name='John' ∧ age>21]}(Student)
Student relation has B-tree index on name
database engine (obviously) has B-tree search method

giving

tmp[i]   := BtreeSearch[name='John'](Student)
tmp[i+1] := LinearSearch[age>21](tmp[i])

Where possible, use pipelining to avoid storing tmp[i] on disk.

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing σ access methods:

σ_A=c(R) and R has index on A ⇒ indexSearch[A=c](R)
σ_A=c(R) and R is hashed on A ⇒ hashSearch[A=c](R)
σ_A=c(R) and R is sorted on A ⇒ binarySearch[A=c](R)
σ_{A ≥ c}(R) and R has clustered index on A
⇒ indexSearch[A=c](R) then scan
σ_{A ≥ c}(R) and R is hashed on A
⇒ linearSearch[A>=c](R)

<< ∧ >>

❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing ⋈ access methods:

R ⋈ S and R fits in memory buffers ⇒ bnlJoin(R,S)
R ⋈ S and S fits in memory buffers ⇒ bnlJoin(S,R)
R ⋈ S and R,S sorted on join attr ⇒ smJoin(R,S)
R ⋈ S and R has index on join attr ⇒ inlJoin(S,R)
R ⋈ S and no indexes, no sorting ⇒ hashJoin(R,S)

(bnl = block nested loop; inl = index nested loop; sm = sort merge)

<< ∧ >>

❖ Cost Estimation

Without executing a plan, cannot always know its precise cost.

Thus, query optimisers estimate costs via:

cost of performing operation (dealt with in earlier lectures)
size of result (which affects cost of performing next operation)

Result size estimated by statistical measures on relations, e.g.

r_S	cardinality of relation S
R_S	avg size of tuple in relation S
V(A,S)	# distinct values of attribute A
min(A,S)	min value of attribute A
max(A,S)	max value of attribute A

<< ∧ >>

❖ Estimating Projection Result Size

Straightforward, since we know:

number of tuples in output
r_out = | π_a,b,..(T) | = | T | = r_T (in SQL, because of bag semantics)
size of tuples in output
R_out = sizeof(a) + sizeof(b) + ... + tuple-overhead

Assume page size B, b_out = ceil(r_T / c_out), where c_out = floor(B/R_out)

If using select distinct ...

| π_a,b,..(T) | depends on proportion of duplicates produced

<< ∧ >>

❖ Estimating Selection Result Size

Selectivity = fraction of tuples expected to satisfy a condition.

Common assumption: attribute values uniformly distributed.

Example: Consider the query

select * from Parts where colour='Red'

If V(colour,Parts)=4, r=1000 ⇒ |σ_colour=red(Parts)|=250

In general, | σ_A=c(R) | ≅ r_R / V(A,R)

Heuristic used by PostgreSQL: | σ_A=c(R) | ≅ r/10 (unless primary key)

<< ∧ >>

❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where year > 2015;

Could estimate by using:

uniform distribution assumption, r, min/max years

Assume: min(year)=2010, max(year)=2019, |Enrolment|=10⁵

10⁵ from 2010-2019 means approx 10000 enrolments/year
this suggests 40000 enrolments since 2016

Heuristic used by some systems: | σ_A>c(R) | ≅ r/3

<< ∧ >>

❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where course <> 'COMP9315';

Could estimate by using:

uniform distribution assumption, r, domain size

e.g. | V(course,Enrolment) | = 2000, | σ_A<>c(E) | = r * 1999/2000

Heuristic used by some systems: | σ_A<>c(R) | ≅ r

<< ∧ >>

❖ Exercise: Selection Size Estimation

Assuming that

all attributes have uniform distribution of data values
attributes are independent of each other

Give formulae for the number of expected results for

select * from R where not A=k
select * from R where A=k and B=j
select * from R where A in (k,l,m,n)

where j, k, l, m, n are constants.

Assume: V(A,R) = 10 and V(B,R)=100 and r=1000

<< ∧ >>

❖ Selection Size Estimation

How to handle non-uniform attribute value distributions?

collect statistics about the values stored in the attribute/relation
store these as e.g. a histogram in the meta-data for the relation

So, for part colour example, might have distribution like:

White: 35% Red: 30% Blue: 25% Silver: 10%

Use histogram as basis for determining # selected tuples.

Disadvantage: cost of storing/maintaining histograms.

<< ∧ >>

❖ Exercise: Selection Size Estimation (ii)

Given a relation R with attribute X whose statistics are:

r_R = 500
V(X,R) = {a,b,c,d,e}
histogram: a,b,c,d,e,NULL = 40:20:10:10:10:10

Estimate the size of the result of the following queries:

select * from R where X is not null
select * from R where X >= 'c'
select * from R where X between 'b' and 'd'

<< ∧ >>

❖ Selection Size Estimation

Summary: analysis relies on operation and data distribution:

E.g. select * from R where a = k;

Case 1: uniq(R.a) ⇒ 0 or 1 result

Case 2: r_R tuples && size(dom(R.a)) = n ⇒ r_R / n results

E.g. select * from R where a < k;

Case 1: k ≤ min(R.a) ⇒ 0 results

Case 2: k > max(R.a) ⇒ ≅ r_R results

Case 3: size(dom(R.a)) = n ⇒ ? min(R.a) ... k ... max(R.a) ?

<< ∧ >>

❖ Estimating Join Result Size

Analysis relies on semantic knowledge about data/relations.

Consider equijoin on common attr: R ⋈_a S

Case 1: values(R.a) ∩ values(S.a) = {} ⇒ size(R ⋈_a S) = 0

Case 2: uniq(R.a) and uniq(S.a) ⇒ size(R ⋈_a S) ≤ min(|R|, |S|)

Case 3: pkey(R.a) and fkey(S.a) ⇒ size(R ⋈_a S) ≤ |S|

<< ∧ >>

❖ Exercise: Join Size Estimation

How many tuples are in the output from:

select * from R, S where R.s = S.id
where S.id is a primary key and R.s is a foreign key referencing S.id
select * from R, S where R.s <> S.id
where S.id is a primary key and R.s is a foreign key referencing S.id
select * from R, S where R.x = S.y
where R.x and S.y have no connection except that dom(R.x)=dom(S.y)

Under what conditions will the first query have maximum size?

<< ∧ >>

❖ Cost Estimation: Postscript

Inaccurate cost estimation can lead to poor evaluation plans.

Above methods can (sometimes) give inaccurate estimates.

To get more accurate cost estimates:

more time ... complex computation of selectivity
more space ... storage for histograms of data values

Either way, optimisation process costs more (more than query?)

Trade-off between optimiser performance and query performance.

<< ∧ >>

❖ PostgreSQL Query Optimiser

<< ∧ >>

❖ Overview of QOpt Process

Input: tree of Query nodes returned by parser

Output: tree of Plan nodes used by query executor

wrapped in a PlannedStmt node containing state info

Intermediate data structures are trees of Path nodes

a path tree represents one evaluation order for a query

All Node types are defined in include/nodes/*.h

<< ∧ >>

❖ Overview of QOpt Process (cont)

<< ∧ >>

❖ QOpt Data Structures

Generic Path node structure:


typedef struct Path
{
   NodeTag     type;          // scan/join/...
   NodeTag     pathtype;      // specific method
   RelOptInfo *parent;        // output relation
   PathTarget *pathtarget;    // list of Vars/Exprs, width 
   // estimated execution costs for path ...
   Cost        startup_cost;  // setup cost
   Cost        total_cost;    // total cost
   List       *pathkeys;      // sort order
} Path;

PathKey = (opfamily:Oid, strategy:SortDir, nulls_first:bool)

<< ∧ >>

❖ QOpt Data Structures (cont)

Specialised Path nodes (simplified):


typedef struct IndexPath
{
   Path    path;
   IndexOptInfo *indexinfo; // index for scan 
   List   *indexclauses; // index select conditions 
   ...
   ScanDirection  indexscandir; // used by planner 
   Selectivity  indexselectivity; // estimated #results 
} IndexPath;

typedef struct JoinPath
{
   Path      path;
   JoinType  jointype;     // inner/outer/semi/anti 
   Path     *outerpath;    // outer part of the join 
   Path     *innerpath;    // inner part of the join 
   List     *restrictinfo; // join condition(s) 
} JoinPath;

<< ∧ >>

❖ Query Optimisation Process

Query optimisation proceeds in two stages (after parsing)...

Rewriting:

uses PostgreSQL's rule system to rearrange RA expressions
query tree is expanded to include e.g. view definitions

Planning and optimisation:

using cost-based analysis of generated paths
via one of two different path generators
chooses least-cost path from all those considered

Then produces a Plan tree from the selected path.

<< ∧ >>

❖ Top-down Trace of QOpt

Top-level of query execution: backend/tcop/postgres.c


exec_simple_query(const char *query_string)
{
  // lots of setting up ... including starting xact
  parsetree_list = pg_parse_query(query_string);
  foreach(parsetree, parsetree_list) {
    // Query optimisation
    querytree_list = pg_analyze_and_rewrite(parsetree,...);
    plantree_list = pg_plan_queries(querytree_list,...);
    // Query execution
    portal = CreatePortal(...plantree_list...);
    PortalRun(portal,...);
  }
  // lots of cleaning up ... including close xact
}

Assumes that we are dealing with multiple queries (i.e. SQL statements)

<< ∧ >>

❖ Top-down Trace of QOpt (cont)

query_planner() produces plan for a select/join tree

make list of tables used in query
split where qualifiers ("quals") into
- restrictions (e.g. r.a=1) ... for selections
- joins (e.g. s.id=r.s) ... for joins
search for quals to enable merge/hash joins

invoke make_one_rel() to find best path/plan

Code in: backend/optimizer/plan/planmain.c

<< ∧ >>

❖ Top-down Trace of QOpt (cont)

make_one_rel() generates possible plans, selects best

generate scan and index paths for base tables
- using restrictions list generated above
generate access paths for the entire join tree
- recursive process, controlled by make_rel_from_joinlist()
returns a single "relation", representing result set

Code in: backend/optimizer/path/allpaths.c

<< ∧ >>

❖ Join-tree Generation

make_rel_from_joinlist() arranges path generation

switches between two possible path tree generators
path tree generators finally return best cost path

Standard path tree generator (standard_join_search()):

"exhaustively" generates join trees (like System R)
starts with 2-way joins, finds best combination
then adds extra table to give 3-table join, etc.

Code in: backend/optimizer/path/{allpaths.c,joinrels.c}

<< ∧ >>

❖ Join-tree Generation (cont)

Genetic query optimiser (geqo):

uses genetic algorithm (GA) to generate path trees
handles joins involving > geqo_threshold relations
goals of this approach:
- find near-optimal solution
- examine far less than entire search space

Code in: backend/optimizer/geqo/*.c

<< ∧ >>

❖ Join-tree Generation (cont)

Basic idea of genetic algorithm:

Optimize(join)
{
   t = 0
   p = initialState(join)  // pool of (random) join orders
   for (t = 0; t < #generations; t++) {
      p' = recombination(p) // get parts of good join orders 
      p'' = mutation(p')    // generate new variations 
      p = selection(p'',p)  // choose best join orders 
   }
}

#generations determined by size of initial pool of join orders

<< ∧ >>

❖ Query Execution

<< ∧ >>

❖ Query Execution

Query execution: applies evaluation plan → result tuples

<< ∧ >>

❖ Query Execution (cont)

Example of query translation:

select s.name, s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and e.semester = '05s2';

maps to

π_{name,id,course,mark}(Stu ⋈_{e.student=s.id} (σ_{semester=05s2}Enr))

maps to

Temp1  = BtreeSelect[semester=05s2](Enr)
Temp2  = HashJoin[e.student=s.id](Stu,Temp1)
Result = Project[name,id,course,mark](Temp2)

<< ∧ >>

❖ Query Execution (cont)

A query execution plan:

consists of a collection of RelOps
executing together to produce a set of result tuples

Results may be passed from one operator to the next:

materialization ... writing results to disk and reading them back
pipelining ... generating and passing via memory buffers

<< ∧ >>

❖ Materialization

Steps in materialization between two operators

first operator reads input(s) and writes results to disk
next operator treats tuples on disk as its input
in essence, the Temp tables are produced as real tables

Advantage:

intermediate results can be placed in a file structure
(which can be chosen to speed up execution of subsequent operators)

Disadvantage:

requires disk space/writes for intermediate results
requires disk access to read intermediate results

<< ∧ >>

❖ Pipelining

How pipelining is organised between two operators:

operators execute "concurrently" as producer/consumer pairs
structured as interacting iterators (open; while(next); close)

Advantage:

no requirement for disk access (results passed via memory buffers)

Disadvantage:

higher-level operators access inputs via linear scan, or
requires sufficient memory buffers to hold all outputs

<< ∧ >>

❖ Iterators (reminder)

Iterators provide a "stream" of results:

iter = startScan(params)
- set up data structures for iterator (create state, open files, ...)
- params are specific to operator (e.g. reln, condition, #buffers, ...)
tuple = nextTuple(iter)
- get the next tuple in the iteration; return null if no more
endScan(iter)
- clean up data structures for iterator

Other possible operations: reset to specific point, restart, ...

<< ∧ >>

❖ Pipelining Example

Consider the query:

select s.id, e.course, e.mark
from   Student s, Enrolment e
where  e.student = s.id and
       e.semester = '05s2' and s.name = 'John';

which maps to the RA expression

Proj_{[id,course,mark]}(Join_[student=id](Sel_[05s2](Enr),Sel_[John](Stu)))

<< ∧ >>

❖ Pipelining Example (cont)

Evaluated via communication between RA tree nodes:

Note: likely that projection is combined with join in PostgreSQL

<< ∧ >>

❖ Disk Accesses

Pipelining cannot avoid all disk accesses.

Some operations use multiple passes (e.g. merge-sort, hash-join).

data is written by one pass, read by subsequent passes

Thus ...

within an operation, disk reads/writes are possible
between operations, no disk reads/writes are needed

<< ∧ >>

❖ PostgreSQL Query Execution

<< ∧ >>

❖ PostgreSQL Query Execution

Defs: src/include/executor and src/include/nodes

Code: src/backend/executor

PostgreSQL uses pipelining ...

query plan is a tree of Plan nodes
each type of node implements one kind of RA operation
(node implements specific access method via iterator interface)
node types e.g. Scan, Group, Indexscan, Sort, HashJoin
execution is managed via a tree of PlanState nodes
(mirrors the structure of the tree of Plan nodes; holds execution state)

<< ∧ >>

❖ PostgreSQL Executor

Modules in src/backend/executor fall into two groups:

execXXX (e.g. execMain, execProcnode, execScan)

implement generic control of plan evaluation (execution)
provide overall plan execution and dispatch to node iterators

nodeXXX (e.g. nodeSeqscan, nodeNestloop, nodeGroup)

implement iterators for specific types of RA operators
typically contains ExecInitXXX, ExecXXX, ExecEndXXX

<< ∧ >>

❖ Example PostgreSQL Execution

Consider the query:

-- get manager's age and # employees in Shoe department
select e.age, d.nemps
from   Departments d, Employees e
where  e.name = d.manager and d.name ='Shoe'

and its execution plan tree

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

The execution plan tree

contains three nodes:

NestedLoop with join condition (Outer.manager = Inner.name)
IndexScan on Departments with selection (name = 'Shoe')
SeqScan on Employees

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

Initially InitPlan() invokes ExecInitNode() on plan tree root.

ExecInitNode() sees a NestedLoop node ...
   so dispatches to ExecInitNestLoop() to set up iterator
   then invokes ExecInitNode() on left and right sub-plans
       in left subPlan, ExecInitNode() sees an IndexScan node
        so dispatches to ExecInitIndexScan() to set up iterator
       in right sub-plan, ExecInitNode() sees a SeqScan node
        so dispatches to ExecInitSeqScan() to set up iterator

Result: a plan state tree with same structure as plan tree.

<< ∧ >>

❖ Example PostgreSQL Execution (cont)

Then ExecutePlan() repeatedly invokes ExecProcNode().

ExecProcNode() sees a NestedLoop node ...
   so dispatches to ExecNestedLoop() to get next tuple
   which invokes ExecProcNode() on its sub-plans
       in left sub-plan, ExecProcNode() sees an IndexScan node
            so dispatches to ExecIndexScan() to get next tuple
            if no more tuples, return END
            for this tuple, invoke ExecProcNode() on right sub-plan
                ExecProcNode() sees a SeqScan node
                    so dispatches to ExecSeqScan() to get next tuple
                    check for match and return joined tuples if found
                    continue scan until end
                reset right sub-plan iterator

Result: stream of result tuples returned via ExecutePlan()

<< ∧ >>

❖ Query Performance

<< ∧ >>

❖ Performance Tuning

How to make a database-backed system perform "better"?

Improving performance may involve any/all of:

making applications using the DB run faster
lowering response time of queries/transactions
improving overall transaction throughput

Remembering that, to some extent ...

the query optimiser removes choices from DB developers
by making its own decision on the optimal execution plan

<< ∧ >>

❖ Performance Tuning (cont)

Tuning requires us to consider the following:

which queries and transactions will be used?
(e.g. check balance for payment, display recent transaction history)
how frequently does each query/transaction occur?
(e.g. 80% withdrawals; 1% deposits; 19% balance check)
are there time constraints on queries/transactions?
(e.g. EFTPOS payments must be approved within 7 seconds)
are there uniqueness constraints on any attributes?
(define indexes on attributes to speed up insertion uniqueness check)
how frequently do updates occur?
(indexes slow down updates, because must update table and index)

<< ∧ >>

❖ Performance Tuning (cont)

Performance can be considered at two times:

during schema design
- typically towards the end of schema design process
- requires schema transformations such as denormalisation
outside schema design
- typically after application has been deployed/used
- requires adding/modifying data structures such as indexes

Difficult to predict what query optimiser will do, so ...

implement queries using methods which should be efficient
observe execution behaviour and modify query accordingly

<< ∧ >>

❖ PostgreSQL Query Tuning

PostgreSQL provides the explain statement to

give a representation of the query execution plan
with information that may help to tune query performance

Usage:

EXPLAIN [ANALYZE] Query

Without ANALYZE, EXPLAIN shows plan with estimated costs.

With ANALYZE, EXPLAIN executes query and prints real costs.

Note that runtimes may show considerable variation due to buffering.

<< ∧ >>

❖ EXPLAIN Examples

Database


people(id, family, given, title, name, ..., birthday)
courses(id, subject, semester, homepage) 
course_enrolments(student, course, mark, grade, ...) 
subjects(id, code, name, longname, uoc, offeredby, ...)
...

where


       table_name          | n_records 
---------------------------+-----------
 people                    |     55767
 courses                   |     73220
 course_enrolments         |    525688
 subjects                  |     18525
...

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute


uni=# explain
uni=# select * from Students where stype='local';
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
   Filter: ((stype)::text = 'local'::text)

where

Seq Scan = operation (plan node)
cost=StartUpCost..TotalCost
rows=NumberOfResultTuples
width=SizeOfTuple (# bytes)

<< ∧ >>

❖ EXPLAIN Examples (cont)

More notes on explain output:

each major entry corresponds to a plan node
- e.g. Seq Scan, Index Scan, Hash Join, Merge Join, ...
some nodes include additional qualifying information
- e.g. Filter, Index Cond, Hash Cond, Buckets, ...
cost values in explain are estimates (notional units)
explain analyze also includes actual time costs (ms)
costs of parent nodes include costs of all children
estimates of #results based on sample of data

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on non-indexed attribute with actual costs


uni=# explain analyze
uni=# select * from Students where stype='local';
                       QUERY PLAN
----------------------------------------------------------
 Seq Scan on students
             (cost=0.00..562.01 rows=23543 width=9)
             (actual time=0.011..4.704 rows=23551 loops=1)
   Filter: ((stype)::text = 'local'::text)
   Rows Removed by Filter: 7810
 Planning time: 0.054 ms
 Execution time: 5.875 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on indexed, unique attribute


uni=# explain analyze
uni-# select * from Students where id=100250;
                       QUERY PLAN
-------------------------------------------------------
 Index Scan using student_pkey on student
            (cost=0.00..8.27 rows=1 width=9)
            (actual time=0.049..0.049 rows=0 loops=1)
   Index Cond: (id = 100250)
 Planning Time: 0.088 ms
 Execution Time: 0.057 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Select on indexed, unique attribute


uni=# explain analyze
uni-# select * from Students where id=1216988;
                       QUERY PLAN
-------------------------------------------------------
 Index Scan using students_pkey on students
                  (cost=0.29..8.30 rows=1 width=9)
                  (actual time=0.011..0.012 rows=1 loops=1)
   Index Cond: (id = 1216988)
 Planning time: 0.066 ms
 Execution time: 0.062 ms

<< ∧ >>

❖ EXPLAIN Examples (cont)

Example: Join on a primary key (indexed) attribute (2016)


uni=# explain analyze
uni-# select s.id,p.name
uni-# from Students s, People p where s.id=p.id;
                      QUERY PLAN
----------------------------------------------------------
Hash Join (cost=988.58..3112.76 rows=31048 width=19)
          (actual time=11.504..39.478 rows=31048 loops=1)
  Hash Cond: (p.id = s.id)
  -> Seq Scan on people p
         (cost=0.00..989.97 rows=36497 width=19)
         (actual time=0.016..8.312 rows=36497 loops=1)
  -> Hash (cost=478.48..478.48 rows=31048 width=4)
          (actual time=10.532..10.532 rows=31048 loops=1)
          Buckets: 4096  Batches: 2  Memory Usage: 548kB
      ->  Seq Scan on students s 
              (cost=0.00..478.48 rows=31048 width=4)
              (actual time=0.005..4.630 rows=31048 loops=1)
Total runtime: 41.0 ms

<< ∧

❖ EXPLAIN Examples (cont)

Example: Join on a primary key (indexed) attribute (2018)


uni=# explain analyze
uni-# select s.id,p.name
uni-# from Students s, People p where s.id=p.id;
                      QUERY PLAN
----------------------------------------------------------
Merge Join  (cost=0.58..2829.25 rows=31361 width=18)
            (actual time=0.044..25.883 rows=31361 loops=1)
  Merge Cond: (s.id = p.id)
  ->  Index Only Scan using students_pkey on students s
            (cost=0.29..995.70 rows=31361 width=4)
            (actual time=0.033..6.195 rows=31361 loops=1)
        Heap Fetches: 31361
  ->  Index Scan using people_pkey on people p
            (cost=0.29..2434.49 rows=55767 width=18)
            (actual time=0.006..6.662 rows=31361 loops=1)
Planning time: 0.259 ms
Execution time: 27.327 ms