G: Query Translation, Optimisation, Execution

COMP9315 24T1 ♢ Lectures Part G ♢ [0/76]
❖ Query Evaluation


[Diagram:Pics/qproc/dbmsarch.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [1/76]
❖ Query Evaluation (cont)

A query in SQL:

A query evaluator/processor :
Some DBMSs can save query plans for later re-use.
COMP9315 24T1 ♢ Lectures Part G ♢ [2/76]
❖ Query Evaluation (cont)

Internals of the query evaluation "black-box":

[Diagram:Pics/qproc/qproc0.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [3/76]
❖ Query Evaluation (cont)

DBMSs provide several "flavours" of each RA operation.

For example:

Similarly, π  and  have versions to match specific query types.

COMP9315 24T1 ♢ Lectures Part G ♢ [4/76]
❖ Query Evaluation (cont)

We call these specialised version of RA operations RelOps.

One major task of the query processor:

Requires the query translator/optimiser to consider RelOps are realised at execution time
COMP9315 24T1 ♢ Lectures Part G ♢ [5/76]
❖ Terminology Variations

Relational algebra expression of SQL query

Execution plan as collection of RelOps Representation of RA operators and expressions
COMP9315 24T1 ♢ Lectures Part G ♢ [6/76]
❖ Query Translation

Query translation:   SQL statement text RA expression

[Diagram:Pics/qproc/qproc1.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [7/76]
❖ Query Translation

Translation step:   SQL text → RA expression

Example:

SQL: select name from Students where id=7654321;
-- is translated to
RA:  Proj[name](Sel[id=7654321]Students)

Processes:  lexer/parser,  mapping rules,  rewriting rules.

Mapping from SQL to RA may include some optimisations, e.g.

select * from Students where id = 54321 and age > 50;
-- is translated to
Sel[age>50](Sel[id=54321]Students)
-- rather than ... because of index on id
Sel[id=54321&age>50](Students)

COMP9315 24T1 ♢ Lectures Part G ♢ [8/76]
❖ Parsing SQL

Parsing task is similar to that for programming languages.

Language elements:

PostgreSQL parser ...

COMP9315 24T1 ♢ Lectures Part G ♢ [9/76]
❖ SQL → RA

Remaining steps in processing an SQL statement

Cost-based optimisation:
COMP9315 24T1 ♢ Lectures Part G ♢ [10/76]
❖ Expression Rewriting Rules

Since RA is a well-defined formal system

Expression transformation based on such rules can be used
COMP9315 24T1 ♢ Lectures Part G ♢ [11/76]
❖ Relational Algebra Laws

Commutative and Associative Laws:

Selection splitting (where c and d are conditions):
COMP9315 24T1 ♢ Lectures Part G ♢ [12/76]
❖ Relational Algebra Laws (cont)

Selection pushing   ( σc(R ∪ S) and σc(R ∪ S) ):

Selection pushing with join ... If condition contains attributes from both R and S:
COMP9315 24T1 ♢ Lectures Part G ♢ [13/76]
❖ Relational Algebra Laws (cont)

Rewrite rules for projection ...

All but last projection can be ignored:

Projections can be pushed into joins:

where
COMP9315 24T1 ♢ Lectures Part G ♢ [14/76]
❖ Relational Algebra Laws (cont)

Subqueries ⇒ convert to a join

Example:   (on schema Courses(id,code,...), Enrolments(cid,sid,...), Students(id,name,...)

select c.code, count(*)
from   Courses c
where  c.id in (select cid from Enrolments)
group  by c.code

becomes

select c.code, count(*)
from   Courses c join Enrolments e on c.id = e.cid
group  by c.code

COMP9315 24T1 ♢ Lectures Part G ♢ [15/76]
❖ Relational Algebra Laws (cont)

But not all subqueries can be converted to join, e.g.

select e.sid as student_id, e.cid as course_id
from   Enrolments e
where  e.sid = (select max(id) from Students)

has to be evaluated as

Val = max[id]Students

Res = π(sid,cid)(σsid=ValEnrolments)

COMP9315 24T1 ♢ Lectures Part G ♢ [16/76]
❖ Query Optimisation

Query optimiser:   RA expression efficient evaluation plan

[Diagram:Pics/qproc/qproc2.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [17/76]
❖ Query Optimisation (cont)

Query optimisation is a critical step in query evaluation.

The query optimiser

"Optimisation" is a misnomer since query optimisers
Observed Query Time = Planning time + Evaluation time
COMP9315 24T1 ♢ Lectures Part G ♢ [18/76]
❖ Query Optimisation (cont)

Why do we not generate optimal query execution plans?

Finding an optimal query plan ...

Even for relatively small query, search space is very large.

Compromise:

COMP9315 24T1 ♢ Lectures Part G ♢ [19/76]
❖ Approaches to Optimisation

Three main classes of techniques developed:

All driven by aim of minimising (or at least reducing) "cost".

Real query optimisers use a combination of algrebraic+physical.

Semantic QO is good idea, but expensive/difficult to implement.

COMP9315 24T1 ♢ Lectures Part G ♢ [20/76]
❖ Query Optimization

Input: relational algebra expression tree

Output: tree of instantiated relation operations

[Diagram:Pics/qproc/query-transform.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [21/76]
❖ Query Optimization (cont)

Where query optimisation fits in query evaluation:

[Diagram:Pics/qproc/qproc2.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [22/76]
❖ Query Optimization (cont)

Approximate algorithm for cost-based optimisation:

translate SQL query to RAexp
for enough transformations RA' of RAexp {
   while (more choices for RelOps in RA') {
      Plan = {}
      for each node N of RA' (recursively) {
         ROp = select RelOp method for N
         Plan = Plan ∪ ROp
      }
      cost = 0
      for each node P of Plan (bottom-up) 
         { cost += Cost(P) // using child info }
      if (cost < MinCost)
         { MinCost = cost;  BestPlan = Plan }
   }
}

COMP9315 24T1 ♢ Lectures Part G ♢ [23/76]
❖ Cost Models and Analysis

The cost of evaluating a query is determined by:

Analysis of costs involves estimating:
COMP9315 24T1 ♢ Lectures Part G ♢ [24/76]
❖ Choosing Access Methods (RelOps)

Performed for each node in RA expression tree ...

Inputs:

Output:
COMP9315 24T1 ♢ Lectures Part G ♢ [25/76]
❖ Choosing Access Methods (RelOps) (cont)

Example:

giving

tmp[i]   := BtreeSearch[name='John'](Student)
tmp[i+1] := LinearSearch[age>21](tmp[i])

Where possible, use pipelining to avoid storing tmp[i] on disk.

COMP9315 24T1 ♢ Lectures Part G ♢ [26/76]
❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing σ access methods:

COMP9315 24T1 ♢ Lectures Part G ♢ [27/76]
❖ Choosing Access Methods (RelOps) (cont)

Rules for choosing access methods:

 
(bnl = block nested loop;   inl = index nested loop;   sm = sort merge)
COMP9315 24T1 ♢ Lectures Part G ♢ [28/76]
❖ Cost Estimation

Without executing a plan, cannot always know its precise cost.

Thus, query optimisers estimate costs via:

Result size estimated by statistical measures on relations, e.g.
rS cardinality of relation S
RS avg size of tuple in relation S
V(A,S) # distinct values of attribute A
min(A,S) min value of attribute A
max(A,S) max value of attribute A
COMP9315 24T1 ♢ Lectures Part G ♢ [29/76]
❖ Estimating Projection Result Size

Straightforward, since we know:

Assume page size B,   bout = ceil(rT / cout),   where cout = floor(B/Rout)

If using select distinct ...

COMP9315 24T1 ♢ Lectures Part G ♢ [30/76]
❖ Estimating Selection Result Size

Selectivity = fraction of tuples expected to satisfy a condition.

Common assumption: attribute values uniformly distributed.

Example: Consider the query

select * from Parts where colour='Red'

If   V(colour,Parts)=4,   r=1000  ⇒  |σcolour=red(Parts)|=250

In general, | σA=c(R) |  ≅  rR / V(A,R)


Heuristic used by PostgreSQL: | σA=c(R) |  ≅  r/10    (unless primary key)

COMP9315 24T1 ♢ Lectures Part G ♢ [31/76]
❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where year > 2015;

Could estimate by using:

Assume: min(year)=2010, max(year)=2019, |Enrolment|=105
Heuristic used by some systems:   | σA>c(R) | ≅ r/3
COMP9315 24T1 ♢ Lectures Part G ♢ [32/76]
❖ Estimating Selection Result Size (cont)

Estimating size of result for e.g.

select * from Enrolment where course <> 'COMP9315';

Could estimate by using:

e.g. | V(course,Enrolment) | = 2000,   | σA<>c(E) | = r * 1999/2000


Heuristic used by some systems:   | σA<>c(R) | ≅ r

COMP9315 24T1 ♢ Lectures Part G ♢ [33/76]
❖ Selection Size Estimation

How to handle non-uniform attribute value distributions?

So, for part colour example, might have distribution like:

White: 35%   Red: 30%   Blue: 25%   Silver: 10%

Use histogram as basis for determining # selected tuples.

Disadvantage: cost of storing/maintaining histograms.

COMP9315 24T1 ♢ Lectures Part G ♢ [34/76]
❖ Selection Size Estimation (cont)

Summary: analysis relies on operation and data distribution:

E.g. select * from R where a = k;

Case 1:   uniq(R.a)   ⇒   0 or 1 result

Case 2:   rR  tuples && size(dom(R.a)) = n   ⇒   rR / n results

E.g. select * from R where a < k;

Case 1:   k ≤ min(R.a)   ⇒   0 results

Case 2:   k > max(R.a)   ⇒   ≅ rR results

Case 3:   size(dom(R.a)) = n   ⇒   ? min(R.a) ... k ... max(R.a) ?

COMP9315 24T1 ♢ Lectures Part G ♢ [35/76]
❖ Estimating Join Result Size

Analysis relies on semantic knowledge about data/relations.

Consider equijoin on common attr: R ⋈a S

Case 1:   values(R.a) ∩ values(S.a) = {}   ⇒   size(R ⋈a S) = 0

Case 2:   uniq(R.a) and uniq(S.a)   ⇒   size(R ⋈a S)  ≤  min(|R|, |S|)

Case 3:   pkey(R.a) and fkey(S.a)   ⇒   size(R ⋈a S)  ≤  |S|

COMP9315 24T1 ♢ Lectures Part G ♢ [36/76]
❖ Cost Estimation: Postscript

Inaccurate cost estimation can lead to poor evaluation plans.

Above methods can (sometimes) give inaccurate estimates.

To get more accurate cost estimates:

Either way, optimisation process costs more (more than query?)

Trade-off between optimiser performance and query performance.

COMP9315 24T1 ♢ Lectures Part G ♢ [37/76]
❖ Overview of PostgreSQL QOpt Process

Input: tree of Query nodes returned by parser

Output: tree of Plan nodes used by query executor

Intermediate data structures are trees of Path nodes All Node types are defined in include/nodes/*.h
COMP9315 24T1 ♢ Lectures Part G ♢ [38/76]
❖ Overview of PostgreSQL QOpt Process (cont)

 

[Diagram:Pics/qproc/qopt-trees1.png]

COMP9315 24T1 ♢ Lectures Part G ♢ [39/76]
❖ QOpt Data Structures

Generic Path node structure:


typedef struct Path
{
   NodeTag     type;          // scan/join/...
   NodeTag     pathtype;      // specific method
   RelOptInfo *parent;        // output relation
   PathTarget *pathtarget;    // list of Vars/Exprs, width 
   // estimated execution costs for path ...
   Cost        startup_cost;  // setup cost
   Cost        total_cost;    // total cost
   List       *pathkeys;      // sort order
} Path;


PathKey = (opfamily:Oid, strategy:SortDir, nulls_first:bool)

COMP9315 24T1 ♢ Lectures Part G ♢ [40/76]
❖ QOpt Data Structures (cont)

Specialised Path nodes (simplified):


typedef struct IndexPath
{
   Path    path;
   IndexOptInfo *indexinfo; // index for scan 
   List   *indexclauses; // index select conditions 
   ...
   ScanDirection  indexscandir; // used by planner 
   Selectivity  indexselectivity; // estimated #results 
} IndexPath;

typedef struct JoinPath
{
   Path      path;
   JoinType  jointype;     // inner/outer/semi/anti 
   Path     *outerpath;    // outer part of the join 
   Path     *innerpath;    // inner part of the join 
   List     *restrictinfo; // join condition(s) 
} JoinPath;

COMP9315 24T1 ♢ Lectures Part G ♢ [41/76]
❖ Query Optimisation Process

Query optimisation proceeds in two stages (after parsing)...

Rewriting:

Planning and optimisation: Then produces a Plan tree from the selected path.
COMP9315 24T1 ♢ Lectures Part G ♢ [42/76]
❖ Top-down Trace of QOpt

Top-level of query execution: backend/tcop/postgres.c


exec_simple_query(const char *query_string)
{
  // lots of setting up ... including starting xact
  parsetree_list = pg_parse_query(query_string);
  foreach(parsetree, parsetree_list) {
    // Query optimisation
    querytree_list = pg_analyze_and_rewrite(parsetree,...);
    plantree_list = pg_plan_queries(querytree_list,...);
    // Query execution
    portal = CreatePortal(...plantree_list...);
    PortalRun(portal,...);
  }
  // lots of cleaning up ... including close xact
}

Assumes that we are dealing with multiple queries (i.e. SQL statements)

COMP9315 24T1 ♢ Lectures Part G ♢ [43/76]
❖ Top-down Trace of QOpt (cont)

query_planner() produces plan for a select/join tree

  • invoke make_one_rel() to find best path/plan

    Code in: backend/optimizer/plan/planmain.c

  • COMP9315 24T1 ♢ Lectures Part G ♢ [44/76]
    ❖ Top-down Trace of QOpt (cont)

    make_one_rel() generates possible plans, selects best

    Code in: backend/optimizer/path/allpaths.c
    COMP9315 24T1 ♢ Lectures Part G ♢ [45/76]
    ❖ Join-tree Generation

    make_rel_from_joinlist() arranges path generation

    Standard path tree generator (standard_join_search()): Code in:   backend/optimizer/path/{allpaths.c,joinrels.c}
    COMP9315 24T1 ♢ Lectures Part G ♢ [46/76]
    ❖ Join-tree Generation (cont)

    Genetic query optimiser (geqo):

    Code in:   backend/optimizer/geqo/*.c
    COMP9315 24T1 ♢ Lectures Part G ♢ [47/76]
    ❖ Join-tree Generation (cont)

    Basic idea of genetic algorithm:

    Optimize(join)
    {
       t = 0
       p = initialState(join)  // pool of (random) join orders
       for (t = 0; t < #generations; t++) {
          p' = recombination(p) // get parts of good join orders 
          p'' = mutation(p')    // generate new variations 
          p = selection(p'',p)  // choose best join orders 
       }
    }
    


    #generations determined by size of initial pool of join orders

    COMP9315 24T1 ♢ Lectures Part G ♢ [48/76]
    ❖ Query Execution

    Query execution:   applies evaluation plan result tuples

    [Diagram:Pics/qproc/qproc3.png]

    COMP9315 24T1 ♢ Lectures Part G ♢ [49/76]
    ❖ Query Execution (cont)

    Example of query translation:

    select s.name, s.id, e.course, e.mark
    from   Student s, Enrolment e
    where  e.student = s.id and e.semester = '05s2';
    

    maps to

    πname,id,course,mark(Stu ⋈e.student=s.id (σsemester=05s2Enr))

    maps to

    Temp1  = BtreeSelect[semester=05s2](Enr)
    Temp2  = HashJoin[e.student=s.id](Stu,Temp1)
    Result = Project[name,id,course,mark](Temp2)
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [50/76]
    ❖ Query Execution (cont)

    A query execution plan:

    Results may be passed from one operator to the next:
    COMP9315 24T1 ♢ Lectures Part G ♢ [51/76]
    ❖ Materialization

    Steps in materialization between two operators

    Advantage: Disadvantage:
    COMP9315 24T1 ♢ Lectures Part G ♢ [52/76]
    ❖ Pipelining

    How pipelining is organised between two operators:

    Advantage: Disadvantage:
    COMP9315 24T1 ♢ Lectures Part G ♢ [53/76]
    ❖ Iterators (reminder)

    Iterators provide a "stream" of results:

    Other possible operations: reset to specific point, restart, ...
    COMP9315 24T1 ♢ Lectures Part G ♢ [54/76]
    ❖ Pipelining Example

    Consider the query:

    select s.id, e.course, e.mark
    from   Student s, Enrolment e
    where  e.student = s.id and
           e.semester = '05s2' and s.name = 'John';
    

    which maps to the RA expression

    Proj[id,course,mark](Join[student=id](Sel[05s2](Enr),Sel[John](Stu)))
    COMP9315 24T1 ♢ Lectures Part G ♢ [55/76]
    ❖ Pipelining Example (cont)

    Evaluated via communication between RA tree nodes:

    [Diagram:Pics/qproc/qtree.png]



    Note: likely that projection is combined with join in PostgreSQL

    COMP9315 24T1 ♢ Lectures Part G ♢ [56/76]
    ❖ Disk Accesses

    Pipelining cannot avoid all disk accesses.

    Some operations use multiple passes (e.g. merge-sort, hash-join).

    Thus ...
    COMP9315 24T1 ♢ Lectures Part G ♢ [57/76]
    ❖ PostgreSQL Query Execution

    Defs: src/include/executor and src/include/nodes

    Code: src/backend/executor

    PostgreSQL uses pipelining ...

    COMP9315 24T1 ♢ Lectures Part G ♢ [58/76]
    ❖ PostgreSQL Executor

    Modules in src/backend/executor fall into two groups:

    execXXX (e.g. execMain, execProcnode, execScan)

    nodeXXX   (e.g. nodeSeqscan, nodeNestloop, nodeGroup)
    COMP9315 24T1 ♢ Lectures Part G ♢ [59/76]
    ❖ Example PostgreSQL Execution

    Consider the query:

    -- get manager's age and # employees in Shoe department
    select e.age, d.nemps
    from   Departments d, Employees e
    where  e.name = d.manager and d.name ='Shoe'
    

    and its execution plan tree

    [Diagram:Pics/qproc/qtree1.png]

    COMP9315 24T1 ♢ Lectures Part G ♢ [60/76]
    ❖ Example PostgreSQL Execution (cont)

    The execution plan tree

    [Diagram:Pics/qproc/qtree1.png]

    contains three nodes:

    COMP9315 24T1 ♢ Lectures Part G ♢ [61/76]
    ❖ Example PostgreSQL Execution (cont)

    Initially InitPlan() invokes ExecInitNode() on plan tree root.

    ExecInitNode() sees a NestedLoop node ...
       so dispatches to ExecInitNestLoop() to set up iterator
       then invokes ExecInitNode() on left and right sub-plans
           in left subPlan, ExecInitNode() sees an IndexScan node
            so dispatches to ExecInitIndexScan() to set up iterator
           in right sub-plan, ExecInitNode() sees a SeqScan node
            so dispatches to ExecInitSeqScan() to set up iterator

    Result: a plan state tree with same structure as plan tree.

    COMP9315 24T1 ♢ Lectures Part G ♢ [62/76]
    ❖ Example PostgreSQL Execution (cont)

    Then ExecutePlan() repeatedly invokes ExecProcNode().

    ExecProcNode() sees a NestedLoop node ...
       so dispatches to ExecNestedLoop() to get next tuple
       which invokes ExecProcNode() on its sub-plans
           in left sub-plan, ExecProcNode() sees an IndexScan node
                so dispatches to ExecIndexScan() to get next tuple
                if no more tuples, return END
                for this tuple, invoke ExecProcNode() on right sub-plan
                    ExecProcNode() sees a SeqScan node
                        so dispatches to ExecSeqScan() to get next tuple
                        check for match and return joined tuples if found
                        continue scan until end
                    reset right sub-plan iterator

    Result: stream of result tuples returned via ExecutePlan()

    COMP9315 24T1 ♢ Lectures Part G ♢ [63/76]
    ❖ Performance Tuning

    How to make a database-backed system perform "better"?

    Improving performance may involve any/all of:

    Remembering that, to some extent ...
    COMP9315 24T1 ♢ Lectures Part G ♢ [64/76]
    ❖ Performance Tuning (cont)

    Tuning requires us to consider the following:

    COMP9315 24T1 ♢ Lectures Part G ♢ [65/76]
    ❖ Performance Tuning (cont)

    Performance can be considered at two times:

    Difficult to predict what query optimiser will do, so ...
    COMP9315 24T1 ♢ Lectures Part G ♢ [66/76]
    ❖ PostgreSQL Query Tuning

    PostgreSQL provides the explain statement to

    Usage:

    EXPLAIN [ANALYZE] Query
    

    Without ANALYZE, EXPLAIN shows plan with estimated costs.

    With ANALYZE, EXPLAIN executes query and prints real costs.

    Note that runtimes may show considerable variation due to buffering.

    COMP9315 24T1 ♢ Lectures Part G ♢ [67/76]
    ❖ EXPLAIN Examples

    Database

    
    people(id, family, given, title, name, ..., birthday)
    courses(id, subject, semester, homepage) 
    course_enrolments(student, course, mark, grade, ...) 
    subjects(id, code, name, longname, uoc, offeredby, ...)
    ...
    

    where

    
           table_name          | n_records 
    ---------------------------+-----------
     people                    |     55767
     courses                   |     73220
     course_enrolments         |    525688
     subjects                  |     18525
    ...
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [68/76]
    ❖ EXPLAIN Examples (cont)

    Example: Select on non-indexed attribute

    
    uni=# explain
    uni=# select * from Students where stype='local';
                         QUERY PLAN
    ----------------------------------------------------
     Seq Scan on students
                 (cost=0.00..562.01 rows=23543 width=9)
       Filter: ((stype)::text = 'local'::text)
    

    where

    COMP9315 24T1 ♢ Lectures Part G ♢ [69/76]
    ❖ EXPLAIN Examples (cont)

    More notes on explain output:

    COMP9315 24T1 ♢ Lectures Part G ♢ [70/76]
    ❖ EXPLAIN Examples (cont)

    Example: Select on non-indexed attribute with actual costs

    
    uni=# explain analyze
    uni=# select * from Students where stype='local';
                           QUERY PLAN
    ----------------------------------------------------------
     Seq Scan on students
                 (cost=0.00..562.01 rows=23543 width=9)
                 (actual time=0.011..4.704 rows=23551 loops=1)
       Filter: ((stype)::text = 'local'::text)
       Rows Removed by Filter: 7810
     Planning time: 0.054 ms
     Execution time: 5.875 ms
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [71/76]
    ❖ EXPLAIN Examples (cont)

    Example: Select on indexed, unique attribute

    
    uni=# explain analyze
    uni-# select * from Students where id=100250;
                           QUERY PLAN
    -------------------------------------------------------
     Index Scan using student_pkey on student
                (cost=0.00..8.27 rows=1 width=9)
                (actual time=0.049..0.049 rows=0 loops=1)
       Index Cond: (id = 100250)
     Planning Time: 0.088 ms
     Execution Time: 0.057 ms
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [72/76]
    ❖ EXPLAIN Examples (cont)

    Example: Select on indexed, unique attribute

    
    uni=# explain analyze
    uni-# select * from Students where id=1216988;
                           QUERY PLAN
    -------------------------------------------------------
     Index Scan using students_pkey on students
                      (cost=0.29..8.30 rows=1 width=9)
                      (actual time=0.011..0.012 rows=1 loops=1)
       Index Cond: (id = 1216988)
     Planning time: 0.066 ms
     Execution time: 0.062 ms
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [73/76]
    ❖ EXPLAIN Examples (cont)

    Example: Join on a primary key (indexed) attribute  (2016)

    
    uni=# explain analyze
    uni-# select s.id,p.name
    uni-# from Students s, People p where s.id=p.id;
                          QUERY PLAN
    ----------------------------------------------------------
    Hash Join (cost=988.58..3112.76 rows=31048 width=19)
              (actual time=11.504..39.478 rows=31048 loops=1)
      Hash Cond: (p.id = s.id)
      -> Seq Scan on people p
             (cost=0.00..989.97 rows=36497 width=19)
             (actual time=0.016..8.312 rows=36497 loops=1)
      -> Hash (cost=478.48..478.48 rows=31048 width=4)
              (actual time=10.532..10.532 rows=31048 loops=1)
              Buckets: 4096  Batches: 2  Memory Usage: 548kB
          ->  Seq Scan on students s 
                  (cost=0.00..478.48 rows=31048 width=4)
                  (actual time=0.005..4.630 rows=31048 loops=1)
    Total runtime: 41.0 ms
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [74/76]
    ❖ EXPLAIN Examples (cont)

    Example: Join on a primary key (indexed) attribute  (2018)

    
    uni=# explain analyze
    uni-# select s.id,p.name
    uni-# from Students s, People p where s.id=p.id;
                          QUERY PLAN
    ----------------------------------------------------------
    Merge Join  (cost=0.58..2829.25 rows=31361 width=18)
                (actual time=0.044..25.883 rows=31361 loops=1)
      Merge Cond: (s.id = p.id)
      ->  Index Only Scan using students_pkey on students s
                (cost=0.29..995.70 rows=31361 width=4)
                (actual time=0.033..6.195 rows=31361 loops=1)
            Heap Fetches: 31361
      ->  Index Scan using people_pkey on people p
                (cost=0.29..2434.49 rows=55767 width=18)
                (actual time=0.006..6.662 rows=31361 loops=1)
    Planning time: 0.259 ms
    Execution time: 27.327 ms
    

    COMP9315 24T1 ♢ Lectures Part G ♢ [75/76]
    ❖ Using EXPLAIN


    For more information on reading plans from EXPLAIN

    Can get EXPLAIN output in different formats: General PostgreSQL performance tuning
    COMP9315 24T1 ♢ Lectures Part G ♢ [76/76]


    Produced: 30 Apr 2024