❖ Week 09 Monday |
❖ Assignment 2 Database |
A few "issues" ...
❖ Assignment 2, Q5 |
Q5 is challenging, much data to keep organised
Inputs:
❖ Exercise: 5893146 | 3778 | COMPA1 |
Manually carry out a progression check on:
ACCT1501:HD, COMP1511:SY, MATH1141:SY, COMP1521:UF, ECON1101:HD MATH1231:CR, COMP2521:HD, MATH1081:DN, COMP3311:HD, ECON1102:CR, MGMT1001:DN, COMP1521:HD, PSYC1001:CR, COMP3331:PS, COMP1531:HD, COMP3121:CR, COMP3411:CR, ARTS1270:AW, COMP2511:CR, COMP3421:CR, COMP4920:DN, COMP3231:CR, COMP6080:DN, DDES1110:CR, COMP3900:HD, COMP9313:DN Program Requirements Total UOC, 144 Comp Sci Majors, 1, 1 COMPA1,COMPD1,COMPE1,COMPI1,COMPJ1,COMPN1,COMPS1,COMPY1 Foundational Computing, all COMP1511,COMP1521,COMP1531,COMP2511,COMP2521 Comp Sci Maths, all MATH1081,{MATH1131;MATH1141},{MATH1231;MATH1241 Comp Sci Advanced Core, all {COMP3121;COMP3821},COMP3900,COMP4920 General Education, 12, GEN##### Stream Requirements Total UOC, min:66 COMPA1 Computing Electives, 30 ENGG2600,ENG3600,ENGG4600,COMP3###,COMP4###,COMP6###,COMP9### COMPA1 Free Electives, 36, FREE####
❖ Exercise: 3893433 | 8543 | COMPCS |
Manually carry out a progression check on:
COMP9020:HD, COMP9021:CR, COMP9024:CR, COMP9311:PS, COMP9032:CR, COMP9331:DN, COMP9201:PS, COMP9315:PS, COMP9444:DN, Program Requirements Total UOC, 96, x COMPCS Disciplinary Electives, 24, 24, x BINF6###,BINF9###,COMP4###,COMP6###,COMP9##,GSOE92### Stream Requirements Total UOC, 96, x MIT Streams, 1, 1, x COMPAS,COMPBS,COMPCS,COMPDS,COMPES,COMPIS,COMPSS Project Management, all, x, GSOE9820 PG Core Courses, all, x COMP9021,COMP9024,COMP9311,COMP9331 MIT Project Courses, all, x, {COMP9900;COMP9991} ADK Courses, 36, 36, x COMP4121,COMP4161,COMP4418,COMP6714,COMP9153,COMP9242,COMP9243, COMP9315,COMP9318,COMP9319,COMP9323,COMP9334,COMP9336,COMP9417, COMP9418, COMP9434,COMP9444,COMP9517,COMP9992,COMP9993
❖ Exercise: 5892582 | 3707 | SENGAH |
Manually carry out a progression check on:
COMP1511:PS, ENGG1000:PS, MATH1131:PS, COMP1521:CR, MATH1081:PS, MATH1231:PS, COMP2521:FL, SENG2011:PS, COMP1531:CR, COMP2521:CR, COMP2041:CR, DESN2000:CR, MATH2859:FL, COMP2511:FL, COMP3311:PS, COMP3331:CR, COMP2511:DN, SENG2021:PS, COMP3141:NF, Program Requirements Total UOC, 192, x BE(Hons) Streams, 1, 1, AEROAH,BINFAH,CEICAH,CEICDH,COMPBH,CVENAH,CVENBH,ELECAH,ELECCH, GMATDH,MANFBH,MINEAH,MTRNAH,PETRAH,SENGAH,SOLAAH,SOLABH,TELEAH Industrial Training, all, x, ENGG4999 General Education, 12, 12, x, GEN##### Stream Requirements Total UOC, 168, x Foundational Computing, all, x COMP1511,COMP1521,COMP1531,COMP2511,COMP2521 SENGAH Maths, all, x MATH1081,{MATH1131;MATH1141},{MATH1231;MATH1241},MATH2400,MATH2859 SENGAH Workshops/Design, all, x {DESN1000;ENGG1000},DESN2000, SENG2011,SENG2021,SENG3011 SENGAH Advanced Core, all, x COMP2041,COMP3141,COMP3311,COMP3331,SENG4920, COMP4951,COMP4952,COMP4953 SENGAH Discipline Electives, 36, x ENGG2600,ENGG3060,ENGG3600,ENGG4600,COMP3###,COMP4###, COMP6###,COMP9###,INFS3###,INFS4###,MATH3###,MATH4###, MATH6###,ELEC3###,ELEC4###,TELE3###,TELE4### SENGAH Free Electives, 6, 6, x,FREE####
❖ Mapping SQL to RA |
Mapping SQL to relational algebra, e.g.
-- schema: R(a,b) S(c,d)
select a as x select a as x
from R join S on (b=c) OR from R, S
where d=100 where b=c and d=100
In general:
SELECT
FROM
WHERE
AS
FROM
WHERE
❖ Mapping SQL to RA (cont) |
Example mapping using above:
-- schema: R(a,b) S(c,d) select a as x from R join S on (b=c) where d=100 -- could be mapped to Tmp1(a,b,c,d) = R Join[b=c] S Tmp2(a,b,c,d) = Sel[d=100](Tmp1) Tmp3(a) = Proj[a](Tmp2) Res(x) = Rename[Res(x)](Tmp3)
❖ Exercise: A Better SQL to RA Mapping |
On the previous slide, we translated this SQL query as follows:
select a as x
from R join S on (b=c)
where d=100
-- mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a) = Proj[a](Tmp2)
Res(x) = Rename[Res(x)](Tmp3)
Suggest a better approach (where "better" = "lower cost")
Assumptions: |R| = 2000, |S| = 1000, |R⋈S| = 1000, |Res| = 100
❖ Another Mapping Example |
The query: Courses with more than 100 students in them?
Can be expressed in SQL as
select s.id, s.code from Course c, Subject s, Enrolment e where c.id = e.course and c.subject = s.id group by s.id, s.code having count(*) > 100;
and might be compiled to
Result = Project[id,code]( GroupSelect[size>100] ( GroupBy[id,code] ( Subject Join[s.id=c.subject] (Course Join[c.id=e.course] Enrolment) ) ) )
❖ Exercise: Relational Operation Plan |
So far, we have been expressing RA evaluation as follows:
Tmp(a,b,c,...) = Op[x](R) or R Op[x] S
Each statement involves a single relational algebra operation.
Render the RA from the previous slide in this form.
Result = Project[id,code]( GroupSelect[size>100] ( GroupBy[id,code] ( Subject Join[s.id=c.subject] (Course Join[c.id=e.course] Enrolment) ) ) )
❖ Implementations of RA Ops |
Sorting (quicksort, etc. are not applicable)
❖ Query Cost Estimation |
The cost of evaluating a query is determined by
❖ Query Cost Estimation (cont) |
An execution plan is a sequence of concrete relational operations.
Consider execution plans for: σc (R ⋈d S ⋈e T)
tmp1 = hash_join[d](R,S) tmp2 = sort_merge_join[e](tmp1,T) result = binary_search[c](tmp2)
or
tmp1 = btree_search[c](R) tmp2 = hash_join[d](tmp1,S) result = sort_merge_join[e](tmp2)
Both produce same result, but have different costs.
❖ Query Optimisation |
A DBMS query optimiser works as follows:
Input: relational algebra expression Output: execution plan (sequence of RA ops) bestCost = INF; bestPlan = none while (more possible plans) { plan = produce a new evaluation plan cost = estimated_cost(plan) if (cost < bestCost) { bestCost = cost; bestPlan = plan } } return bestPlan
Typically, there are very, very many possible plans
❖ DB Application Performance |
In order to make DB applications efficient, it is useful to know:
(which queries, updates, inserts and how frequently is each one performed)
(data structures and algorithms for select, project, join, sort, ...)
(in terms of the amount of data transferred between memory and disk)
Achieve by using indexes and avoiding certain SQL query structures
❖ DB Application Performance (cont) |
Application programmer choices that affect query cost:
❖ Indexes |
Indexes provide efficient content-based access to tuples.
Can build indexes on any (combination of) attributes.
Definining indexes:
CREATE INDEX name ON table ( attr1, attr2, ... )
attri can be an arbitrary expression (e.g. upper(name)
CREATE INDEX
UNIQUE
USING
❖ Indexes (cont) |
Indexes can significantly improve query costs.
Considerations in applying indexes:
-- use hashing for (unique) attributes in equality tests, e.g. select * from Employee where id = 12345 -- use B-tree for attributes in range tests, e.g. select * from Employee where age > 60
❖ Exercise: Indexes and Query Types |
Consider a table Students(id,name,age,...)
If we had the following mix of queries
Query | Freq |
select * from Students where id = zID |
70% |
select * from Students where name = Name |
10% |
select * from Students where name like '%str%' |
15% |
select * from Students where age > Age |
5% |
❖ Expressing SQL Queries |
Example: query to find sales people earning more than $50K
select name from Employee where salary > 50000 and empid in (select empid from Worksin where dept = 'Sales')
A query evaluator would have to use the strategy
SalesEmps = (select empid from WorksIn where dept='Sales') foreach e in Employee { if (e.empid in SalesEmps && e.salary > 50000) add e to result set }
Needs to examine all employees, even if not in Sales
This is not a good expression of the query.
❖ Expressing SQL Queries (cont) |
A different expression of the same query:
select name from Employee join WorksIn using (empid) where Employee.salary > 5000 and WorksIn.dept = 'Sales'
Query evaluator might use the strategy
SalesEmps = (select * from WorksIn where dept='Sales') foreach e in (Employee join SalesEmps) { if (e.salary > 50000) add e to result set }
Only examines Sales employees, and uses a simpler test
This is a good expression of the query.
❖ Expressing SQL Queries (cont) |
A very poor expression of the query (correlated subquery):
select name from Employee e where salary > 50000 and 'Sales' in (select dept from Worksin where empid=e.id)
A query evaluator would be forced to use the strategy:
foreach e in Employee { Depts = (select dept from WorksIn where empid=e.empid) if ('Sales' in Depts && e.salary > 50000) add e to result set }
Needs to run a query for every employee ...
❖ Exercise: Comparing Query Costs |
Measure the time costs of the following queries:
select count(*) from Subjects where substring(title,1,1) = 'D'; select count(*) from Subjects where title like 'D%'; select count(*) from Subjects where title ilike 'd%'; select count(*) from Subjects where title ~ '^D.*'; select count(*) from Subjects where title ~* '^d.*'; select count(*) from Subjects where title = 'Database Systems'; select count(*) from Subjects where id = 56571; select max(id) from Subjects; select max(title) from Subjects; select count(*) from Subjects s join Courses c on s.id = c.subject join Course_enrolments e on c.id = e.course where code = 'COMP1511';
❖ Query Tuning |
Sometimes, a query can be re-phrased to affect performance:
select name from Employee where salary/365 > 100 -- fix by re-phrasing condition to (salary > 36500) select name from Employee where name like '%ith%' select name from Employee where birthday is null -- above two are difficult to "fix" select name from Employee where dept in (select id from Dept where ...) -- fix by using Employee join Dept on (e.dept=d.id)
❖ Query Tuning (cont) |
Other tricks in query tuning (effectiveness is DBMS-dependent)
select distinct
distinct
select ... Employee join Customer on (s.name = p.name)
vs
select ... Employee join Customer on (s.ssn = p.ssn)
or
or
or
select name from Employee where Dept=1 or Dept=2
vs
(select name from Employee where Dept=1)
union
(select name from Employee where Dept=2)
❖ Exercise: More Query Costs |
Check whether the optimisations mentioned above actually work
-- names of course convenors select full_name from People where id in (select convenor from Courses) -- or -- select p.full_name from People p where exists (select * from Courses where convenor=p.id) -- or -- select p.full_name from People p join Courses c on (p.id=c.convenor) -- people from Colombia(48) or Mongolia(140) select full_name from People where origin = 48 or origin = 140 -- or -- (select full_name from People where origin = 48) union (select full_name from People where origin = 140) -- or -- select full_name from People where origin in (48,140)
Produced: 6 Nov 2023