❖ Query Evaluation (cont) |
A query in SQL:
❖ Query Evaluation (cont) |
DBMSs provide several "flavours" of each RA operation.
For example:
select * from R where id = 100 -- hashing select * from S -- Btree index where age > 18 and age < 35 select * from T -- MALH file where a = 1 and b = 'a' and c = 1.4
Similarly, π and ⋈ have versions to match specific query types.
❖ Query Evaluation (cont) |
We call these specialised version of RA operations RelOps.
One major task of the query processor:
❖ Terminology Variations |
Relational algebra expression of SQL query
❖ Query Translation |
Translation step: SQL text → RA expression
Example:
SQL: select name from Students where id=7654321;
-- is translated to
RA: Proj[name](Sel[id=7654321]Students)
Processes: lexer/parser, mapping rules, rewriting rules.
Mapping from SQL to RA may include some optimisations, e.g.
select * from Students where id = 54321 and age > 50; -- is translated to Sel[age>50](Sel[id=54321]Students) -- rather than ... because of index on id Sel[id=54321&age>50](Students)
❖ Parsing SQL |
Parsing task is similar to that for programming languages.
Language elements:
create
select
from
where
Students
name
id
CourseCode
+
-
=
<
>
AND
OR
NOT
IN
'abc'
123
3.1
'01-jan-1970'
PostgreSQL parser ...
src/backend/parser
src/backend/catalog
❖ SQL → RA |
Remaining steps in processing an SQL statement
❖ Expression Rewriting Rules |
Since RA is a well-defined formal system
❖ Relational Algebra Laws |
Commutative and Associative Laws:
❖ Relational Algebra Laws (cont) |
Selection pushing ( σc(R ∪ S) and σc(R ∪ S) ):
❖ Relational Algebra Laws (cont) |
Rewrite rules for projection ...
All but last projection can be ignored:
Projections can be pushed into joins:
❖ Relational Algebra Laws (cont) |
Subqueries ⇒ convert to a join
Example:
select c.code, count(*) from Courses c where c.id in (select cid from Enrolments) group by c.code
becomes
select c.code, count(*) from Courses c join Enrolments e on c.id = e.cid group by c.code
❖ Relational Algebra Laws (cont) |
But not all subqueries can be converted to join, e.g.
select e.sid as student_id, e.cid as course_id from Enrolments e where e.sid = (select max(id) from Students)
has to be evaluated as
Val = max[id]Students
Res = π(sid,cid)(σsid=ValEnrolments)
❖ Query Optimisation (cont) |
Query optimisation is a critical step in query evaluation.
The query optimiser
❖ Query Optimisation (cont) |
Why do we not generate optimal query execution plans?
Finding an optimal query plan ...
Compromise:
❖ Approaches to Optimisation |
Three main classes of techniques developed:
Real query optimisers use a combination of algrebraic+physical.
Semantic QO is good idea, but expensive/difficult to implement.
❖ Query Optimization |
Input: relational algebra expression tree
Output: tree of instantiated relation operations
❖ Query Optimization (cont) |
Approximate algorithm for cost-based optimisation:
translate SQL query to RAexp
for enough transformations RA' of RAexp {
while (more choices for RelOps in RA') {
Plan = {}
for each node N of RA' (recursively) {
ROp = select RelOp method for N
Plan = Plan ∪ ROp
}
cost = 0
for each node P of Plan (bottom-up)
{ cost += Cost(P) // using child info }
if (cost < MinCost)
{ MinCost = cost; BestPlan = Plan }
}
}
❖ Cost Models and Analysis |
The cost of evaluating a query is determined by:
❖ Choosing Access Methods (RelOps) |
Performed for each node in RA expression tree ...
Inputs:
❖ Choosing Access Methods (RelOps) (cont) |
Example:
Student
name
tmp[i] := BtreeSearch[name='John'](Student) tmp[i+1] := LinearSearch[age>21](tmp[i])
Where possible, use pipelining to avoid storing tmp[i]
❖ Choosing Access Methods (RelOps) (cont) |
Rules for choosing σ access methods:
R
A
indexSearch[A=c](R)
R
A
hashSearch[A=c](R)
R
A
binarySearch[A=c](R)
R
A
indexSearch[A=c](R)
R
A
linearSearch[A>=c](R)
❖ Choosing Access Methods (RelOps) (cont) |
Rules for choosing ⋈ access methods:
R
bnlJoin(R,S)
S
bnlJoin(S,R)
R
S
smJoin(R,S)
R
inlJoin(S,R)
hashJoin(R,S)
bnl
inl
sm
❖ Cost Estimation |
Without executing a plan, cannot always know its precise cost.
Thus, query optimisers estimate costs via:
rS | cardinality of relation S |
RS | avg size of tuple in relation S |
V(A,S) | # distinct values of attribute A |
min(A,S) | min value of attribute A |
max(A,S) | max value of attribute A |
❖ Estimating Projection Result Size |
Straightforward, since we know:
rout = | πa,b,..(T) | = | T | = rT (in SQL, because of bag semantics)
Rout = sizeof(a) + sizeof(b) + ... + tuple-overhead
Assume page size B, bout = ceil(rT / cout), where cout = floor(B/Rout)
If using select distinct
❖ Estimating Selection Result Size |
Selectivity = fraction of tuples expected to satisfy a condition.
Common assumption: attribute values uniformly distributed.
Example: Consider the query
select * from Parts where colour='Red'
If V(colour,Parts)=4, r=1000 ⇒ |σcolour=red(Parts)|=250
In general, | σA=c(R) | ≅ rR / V(A,R)
Heuristic used by PostgreSQL: | σA=c(R) | ≅ r/10 (unless primary key)
❖ Estimating Selection Result Size (cont) |
Estimating size of result for e.g.
select * from Enrolment where year > 2015;
Could estimate by using:
❖ Estimating Selection Result Size (cont) |
Estimating size of result for e.g.
select * from Enrolment where course <> 'COMP9315';
Could estimate by using:
Heuristic used by some systems: | σA<>c(R) | ≅ r
❖ Selection Size Estimation |
How to handle non-uniform attribute value distributions?
White
Red
Blue
Silver
Use histogram as basis for determining # selected tuples.
Disadvantage: cost of storing/maintaining histograms.
❖ Selection Size Estimation (cont) |
Summary: analysis relies on operation and data distribution:
E.g. select * from R where a = k;
Case 1: uniq(R.a) ⇒ 0 or 1 result
Case 2: rR tuples && size(dom(R.a)) = n ⇒ rR / n results
E.g. select * from R where a < k;
Case 1: k ≤ min(R.a) ⇒ 0 results
Case 2: k > max(R.a) ⇒ ≅ rR results
Case 3: size(dom(R.a)) = n ⇒ ? min(R.a) ... k ... max(R.a) ?
❖ Estimating Join Result Size |
Analysis relies on semantic knowledge about data/relations.
Consider equijoin on common attr: R ⋈a S
Case 1: values(R.a) ∩ values(S.a) = {} ⇒ size(R ⋈a S) = 0
Case 2: uniq(R.a) and uniq(S.a) ⇒ size(R ⋈a S) ≤ min(|R|, |S|)
Case 3: pkey(R.a) and fkey(S.a) ⇒ size(R ⋈a S) ≤ |S|
❖ Cost Estimation: Postscript |
Inaccurate cost estimation can lead to poor evaluation plans.
Above methods can (sometimes) give inaccurate estimates.
To get more accurate cost estimates:
Trade-off between optimiser performance and query performance.
❖ Overview of PostgreSQL QOpt Process |
Input: tree of Query
Output: tree of Plan
PlannedStmt
Path
Node
include/nodes/*.h
❖ QOpt Data Structures |
Generic Path
typedef struct Path { NodeTag type; // scan/join/... NodeTag pathtype; // specific method RelOptInfo *parent; // output relation PathTarget *pathtarget; // list of Vars/Exprs, width // estimated execution costs for path ... Cost startup_cost; // setup cost Cost total_cost; // total cost List *pathkeys; // sort order } Path;
PathKey
❖ QOpt Data Structures (cont) |
Specialised Path
typedef struct IndexPath { Path path; IndexOptInfo *indexinfo; // index for scan List *indexclauses; // index select conditions ... ScanDirection indexscandir; // used by planner Selectivity indexselectivity; // estimated #results } IndexPath; typedef struct JoinPath { Path path; JoinType jointype; // inner/outer/semi/anti Path *outerpath; // outer part of the join Path *innerpath; // inner part of the join List *restrictinfo; // join condition(s) } JoinPath;
❖ Query Optimisation Process |
Query optimisation proceeds in two stages (after parsing)...
Rewriting:
Plan
❖ Top-down Trace of QOpt |
Top-level of query execution: backend/tcop/postgres.c
exec_simple_query(const char *query_string) { // lots of setting up ... including starting xact parsetree_list = pg_parse_query(query_string); foreach(parsetree, parsetree_list) { // Query optimisation querytree_list = pg_analyze_and_rewrite(parsetree,...); plantree_list = pg_plan_queries(querytree_list,...); // Query execution portal = CreatePortal(...plantree_list...); PortalRun(portal,...); } // lots of cleaning up ... including close xact }
Assumes that we are dealing with multiple queries (i.e. SQL statements)
❖ Top-down Trace of QOpt (cont) |
query_planner()
where
r.a=1
s.id=r.s
make_one_rel()
Code in: backend/optimizer/plan/planmain.c
❖ Top-down Trace of QOpt (cont) |
make_one_rel()
make_rel_from_joinlist()
backend/optimizer/path/allpaths.c
❖ Join-tree Generation |
make_rel_from_joinlist()
standard_join_search()
backend/optimizer/path/{allpaths.c,joinrels.c}
❖ Join-tree Generation (cont) |
Genetic query optimiser (geqo
geqo_threshold
backend/optimizer/geqo/*.c
❖ Join-tree Generation (cont) |
Basic idea of genetic algorithm:
Optimize(join) { t = 0 p = initialState(join) // pool of (random) join orders for (t = 0; t < #generations; t++) { p' = recombination(p) // get parts of good join orders p'' = mutation(p') // generate new variations p = selection(p'',p) // choose best join orders } }
#generations determined by size of initial pool of join orders
❖ Query Execution (cont) |
Example of query translation:
select s.name, s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2';
maps to
maps to
Temp1 = BtreeSelect[semester=05s2](Enr) Temp2 = HashJoin[e.student=s.id](Stu,Temp1) Result = Project[name,id,course,mark](Temp2)
❖ Query Execution (cont) |
A query execution plan:
❖ Materialization |
Steps in materialization between two operators
Temp
❖ Pipelining |
How pipelining is organised between two operators:
❖ Iterators (reminder) |
Iterators provide a "stream" of results:
iter = startScan(
)
tuple = nextTuple(iter)
endScan(iter)
❖ Pipelining Example |
Consider the query:
select s.id, e.course, e.mark from Student s, Enrolment e where e.student = s.id and e.semester = '05s2' and s.name = 'John';
which maps to the RA expression
❖ Pipelining Example (cont) |
Evaluated via communication between RA tree nodes:
Note: likely that projection is combined with join in PostgreSQL
❖ Disk Accesses |
Pipelining cannot avoid all disk accesses.
Some operations use multiple passes (e.g. merge-sort, hash-join).
❖ PostgreSQL Query Execution |
Defs: src/include/executor
src/include/nodes
Code: src/backend/executor
PostgreSQL uses pipelining ...
Plan
Scan
Group
Indexscan
Sort
HashJoin
PlanState
❖ PostgreSQL Executor |
Modules in src/backend/executor
execXXX
nodeXXX
ExecInitXXX
ExecXXX
ExecEndXXX
❖ Example PostgreSQL Execution |
Consider the query:
-- get manager's age and # employees in Shoe department
select e.age, d.nemps
from Departments d, Employees e
where e.name = d.manager and d.name ='Shoe'
and its execution plan tree
❖ Example PostgreSQL Execution (cont) |
The execution plan tree
contains three nodes:
NestedLoop
IndexScan
SeqScan
❖ Example PostgreSQL Execution (cont) |
Initially InitPlan()
ExecInitNode()
ExecInitNode()
NestedLoop
so dispatches to ExecInitNestLoop()
then invokes ExecInitNode()
in left subPlan, ExecInitNode()
IndexScan
so dispatches to ExecInitIndexScan()
in right sub-plan, ExecInitNode()
SeqScan
so dispatches to ExecInitSeqScan()
Result: a plan state tree with same structure as plan tree.
❖ Example PostgreSQL Execution (cont) |
Then ExecutePlan()
ExecProcNode()
ExecProcNode()
NestedLoop
so dispatches to ExecNestedLoop()
which invokes ExecProcNode()
in left sub-plan, ExecProcNode()
IndexScan
so dispatches to ExecIndexScan()
if no more tuples, return END
for this tuple, invoke ExecProcNode()
ExecProcNode()
SeqScan
so dispatches to ExecSeqScan()
check for match and return joined tuples if found
continue scan until end
reset right sub-plan iterator
Result: stream of result tuples returned via ExecutePlan()
❖ Performance Tuning |
How to make a database-backed system perform "better"?
Improving performance may involve any/all of:
❖ Performance Tuning (cont) |
Tuning requires us to consider the following:
❖ Performance Tuning (cont) |
Performance can be considered at two times:
❖ PostgreSQL Query Tuning |
PostgreSQL provides the explain
EXPLAIN [ANALYZE] Query
Without ANALYZE
EXPLAIN
With ANALYZE
EXPLAIN
Note that runtimes may show considerable variation due to buffering.
❖ EXPLAIN Examples |
Database
people(id, family, given, title, name, ..., birthday) courses(id, subject, semester, homepage) course_enrolments(student, course, mark, grade, ...) subjects(id, code, name, longname, uoc, offeredby, ...) ...
where
table_name | n_records ---------------------------+----------- people | 55767 courses | 73220 course_enrolments | 525688 subjects | 18525 ...
❖ EXPLAIN Examples (cont) |
Example: Select on non-indexed attribute
uni=# explain uni=# select * from Students where stype='local'; QUERY PLAN ---------------------------------------------------- Seq Scan on students (cost=0.00..562.01 rows=23543 width=9) Filter: ((stype)::text = 'local'::text)
where
Seq Scan
cost=
..
rows=
width=
❖ EXPLAIN Examples (cont) |
More notes on explain
Seq Scan
Index Scan
Hash Join
Merge Join
Filter
Index Cond
Hash Cond
Buckets
cost
explain
explain analyze
❖ EXPLAIN Examples (cont) |
Example: Select on non-indexed attribute with actual costs
uni=# explain analyze uni=# select * from Students where stype='local'; QUERY PLAN ---------------------------------------------------------- Seq Scan on students (cost=0.00..562.01 rows=23543 width=9) (actual time=0.011..4.704 rows=23551 loops=1) Filter: ((stype)::text = 'local'::text) Rows Removed by Filter: 7810 Planning time: 0.054 ms Execution time: 5.875 ms
❖ EXPLAIN Examples (cont) |
Example: Select on indexed, unique attribute
uni=# explain analyze uni-# select * from Students where id=100250; QUERY PLAN ------------------------------------------------------- Index Scan using student_pkey on student (cost=0.00..8.27 rows=1 width=9) (actual time=0.049..0.049 rows=0 loops=1) Index Cond: (id = 100250) Planning Time: 0.088 ms Execution Time: 0.057 ms
❖ EXPLAIN Examples (cont) |
Example: Select on indexed, unique attribute
uni=# explain analyze uni-# select * from Students where id=1216988; QUERY PLAN ------------------------------------------------------- Index Scan using students_pkey on students (cost=0.29..8.30 rows=1 width=9) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (id = 1216988) Planning time: 0.066 ms Execution time: 0.062 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a primary key (indexed) attribute (2016)
uni=# explain analyze uni-# select s.id,p.name uni-# from Students s, People p where s.id=p.id; QUERY PLAN ---------------------------------------------------------- Hash Join (cost=988.58..3112.76 rows=31048 width=19) (actual time=11.504..39.478 rows=31048 loops=1) Hash Cond: (p.id = s.id) -> Seq Scan on people p (cost=0.00..989.97 rows=36497 width=19) (actual time=0.016..8.312 rows=36497 loops=1) -> Hash (cost=478.48..478.48 rows=31048 width=4) (actual time=10.532..10.532 rows=31048 loops=1) Buckets: 4096 Batches: 2 Memory Usage: 548kB -> Seq Scan on students s (cost=0.00..478.48 rows=31048 width=4) (actual time=0.005..4.630 rows=31048 loops=1) Total runtime: 41.0 ms
❖ EXPLAIN Examples (cont) |
Example: Join on a primary key (indexed) attribute (2018)
uni=# explain analyze uni-# select s.id,p.name uni-# from Students s, People p where s.id=p.id; QUERY PLAN ---------------------------------------------------------- Merge Join (cost=0.58..2829.25 rows=31361 width=18) (actual time=0.044..25.883 rows=31361 loops=1) Merge Cond: (s.id = p.id) -> Index Only Scan using students_pkey on students s (cost=0.29..995.70 rows=31361 width=4) (actual time=0.033..6.195 rows=31361 loops=1) Heap Fetches: 31361 -> Index Scan using people_pkey on people p (cost=0.29..2434.49 rows=55767 width=18) (actual time=0.006..6.662 rows=31361 loops=1) Planning time: 0.259 ms Execution time: 27.327 ms
❖ Using EXPLAIN |
For more information on reading plans from EXPLAIN
EXPLAIN
FORMAT { TEXT | XML | JSON | YAML }
EXPLAIN
Produced: 30 Apr 2024