COMP3311 Week 9 Monday Lecture

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [0/28]
❖ Week 09 Monday

In today's lecture ...

Things to do ...

Things to note ...

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [1/28]
❖ Assignment 2 Database

A few "issues" ...

Work with the database as given (except for updates to stream UOCs)
COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [2/28]
❖ Assignment 2, Q5

Q5 is challenging, much data to keep organised

Inputs:

The Aim:
COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [3/28]
❖ Exercise: 5893146 | 3778 | COMPA1

Manually carry out a progression check on:

ACCT1501:HD, COMP1511:SY, MATH1141:SY, COMP1521:UF, ECON1101:HD
MATH1231:CR, COMP2521:HD, MATH1081:DN, COMP3311:HD, ECON1102:CR,
MGMT1001:DN, COMP1521:HD, PSYC1001:CR, COMP3331:PS, COMP1531:HD,
COMP3121:CR, COMP3411:CR, ARTS1270:AW, COMP2511:CR, COMP3421:CR,
COMP4920:DN, COMP3231:CR, COMP6080:DN, DDES1110:CR, COMP3900:HD,
COMP9313:DN

Program Requirements
 Total UOC, 144
 Comp Sci Majors, 1, 1
   COMPA1,COMPD1,COMPE1,COMPI1,COMPJ1,COMPN1,COMPS1,COMPY1
 Foundational Computing, all
   COMP1511,COMP1521,COMP1531,COMP2511,COMP2521
 Comp Sci Maths, all
   MATH1081,{MATH1131;MATH1141},{MATH1231;MATH1241
 Comp Sci Advanced Core, all
   {COMP3121;COMP3821},COMP3900,COMP4920
 General Education, 12, GEN#####

Stream Requirements
 Total UOC, min:66
 COMPA1 Computing Electives, 30
   ENGG2600,ENG3600,ENGG4600,COMP3###,COMP4###,COMP6###,COMP9###
 COMPA1 Free Electives, 36, FREE####

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [4/28]
❖ Exercise: 3893433 | 8543 | COMPCS

Manually carry out a progression check on:

COMP9020:HD, COMP9021:CR, COMP9024:CR, COMP9311:PS, COMP9032:CR,
COMP9331:DN, COMP9201:PS, COMP9315:PS, COMP9444:DN,

Program Requirements
 Total UOC, 96, x
 COMPCS Disciplinary Electives, 24, 24, x
  BINF6###,BINF9###,COMP4###,COMP6###,COMP9##,GSOE92###

Stream Requirements
 Total UOC, 96, x
 MIT Streams, 1, 1, x
   COMPAS,COMPBS,COMPCS,COMPDS,COMPES,COMPIS,COMPSS
 Project Management, all, x, GSOE9820
 PG Core Courses, all, x
   COMP9021,COMP9024,COMP9311,COMP9331
 MIT Project Courses, all, x, {COMP9900;COMP9991}
 ADK Courses, 36, 36, x
   COMP4121,COMP4161,COMP4418,COMP6714,COMP9153,COMP9242,COMP9243,
   COMP9315,COMP9318,COMP9319,COMP9323,COMP9334,COMP9336,COMP9417,
   COMP9418, COMP9434,COMP9444,COMP9517,COMP9992,COMP9993

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [5/28]
❖ Exercise: 5892582 | 3707 | SENGAH

Manually carry out a progression check on:

COMP1511:PS, ENGG1000:PS, MATH1131:PS, COMP1521:CR, MATH1081:PS,
MATH1231:PS, COMP2521:FL, SENG2011:PS, COMP1531:CR, COMP2521:CR,
COMP2041:CR, DESN2000:CR, MATH2859:FL, COMP2511:FL, COMP3311:PS,
COMP3331:CR, COMP2511:DN, SENG2021:PS, COMP3141:NF,

Program Requirements
Total UOC, 192, x
BE(Hons) Streams, 1, 1,
 AEROAH,BINFAH,CEICAH,CEICDH,COMPBH,CVENAH,CVENBH,ELECAH,ELECCH,
 GMATDH,MANFBH,MINEAH,MTRNAH,PETRAH,SENGAH,SOLAAH,SOLABH,TELEAH
Industrial Training, all, x, ENGG4999
General Education, 12, 12, x, GEN#####

Stream Requirements
Total UOC, 168, x
Foundational Computing, all, x
 COMP1511,COMP1521,COMP1531,COMP2511,COMP2521
SENGAH Maths, all, x
 MATH1081,{MATH1131;MATH1141},{MATH1231;MATH1241},MATH2400,MATH2859
SENGAH Workshops/Design, all, x
 {DESN1000;ENGG1000},DESN2000, SENG2011,SENG2021,SENG3011
SENGAH Advanced Core, all, x
 COMP2041,COMP3141,COMP3311,COMP3331,SENG4920,
 COMP4951,COMP4952,COMP4953
SENGAH Discipline Electives, 36, x 
 ENGG2600,ENGG3060,ENGG3600,ENGG4600,COMP3###,COMP4###,
 COMP6###,COMP9###,INFS3###,INFS4###,MATH3###,MATH4###,
 MATH6###,ELEC3###,ELEC4###,TELE3###,TELE4###
SENGAH Free Electives, 6, 6, x,FREE####

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [6/28]
❖ Query Evaluation

The path of a query through its evaluation:


[Diagram:Pics/dbms/qryeval.png]

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [7/28]
❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

-- schema: R(a,b) S(c,d)
select a as x                   select a as x
from   R join S on (b=c)   OR   from   R, S
where  d=100                    where  b=c and d=100

In general:

Note: join conditions may appear either in FROM or WHERE
COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [8/28]
❖ Mapping SQL to RA (cont)


Example mapping using above:

-- schema: R(a,b) S(c,d)
select a as x
from   R join S on (b=c)
where  d=100

-- could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [9/28]
❖ Exercise: A Better SQL to RA Mapping

On the previous slide, we translated this SQL query as follows:

select a as x
from   R join S on (b=c)
where  d=100

-- mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a)       = Proj[a](Tmp2)
Res(x)        = Rename[Res(x)](Tmp3)

Suggest a better approach  (where "better" = "lower cost")

Assumptions:  |R| = 2000,  |S| = 1000,  |R⋈S| = 1000,  |Res| = 100

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [10/28]
❖ Another Mapping Example

The query: Courses with more than 100 students in them?

Can be expressed in SQL as

select   s.id, s.code
from     Course c, Subject s, Enrolment e
where    c.id = e.course and c.subject = s.id
group by s.id, s.code
having   count(*) > 100;

and might be compiled to

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [11/28]
❖ Exercise: Relational Operation Plan

So far, we have been expressing RA evaluation as follows:

Tmp(a,b,c,...) =  Op[x](R)  or  R Op[x] S

Each statement involves a single relational algebra operation.

Render the RA from the previous slide in this form.

Result =
Project[id,code](
   GroupSelect[size>100] (
      GroupBy[id,code] (
         Subject Join[s.id=c.subject]
         (Course Join[c.id=e.course] Enrolment)
)  )  )

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [12/28]
❖ Implementations of RA Ops

Sorting   (quicksort, etc. are not applicable)

Selection   (different techniques developed for different query types) Join   (fast joins are critical to success of relational DBMSs)
COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [13/28]
❖ Query Cost Estimation

The cost of evaluating a query is determined by

Analysis of costs involves estimating:
Accessing data from disk is the dominant cost in query evaluation
COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [14/28]
❖ Query Cost Estimation (cont)

An execution plan is a sequence of concrete relational operations.

Consider execution plans for:    σc (R  ⋈d  S  ⋈e  T)

tmp1   =  hash_join[d](R,S)
tmp2   =  sort_merge_join[e](tmp1,T)
result =  binary_search[c](tmp2)

or

tmp1   =  btree_search[c](R)
tmp2   =  hash_join[d](tmp1,S)
result =  sort_merge_join[e](tmp2)

Both produce same result, but have different costs.

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [15/28]
❖ Query Optimisation

A DBMS query optimiser works as follows:

Input: relational algebra expression
Output: execution plan (sequence of RA ops)

bestCost = INF; bestPlan = none
while (more possible plans) {
   plan = produce a new evaluation plan
   cost = estimated_cost(plan)
   if (cost < bestCost)
      { bestCost = cost; bestPlan = plan }
}
return bestPlan

Typically, there are very, very many possible plans

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [16/28]
❖ DB Application Performance

In order to make DB applications efficient, it is useful to know:

and then, "encourage" DBMS to use the most efficient methods

Achieve by using indexes and avoiding certain SQL query structures

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [17/28]
❖ DB Application Performance (cont)

Application programmer choices that affect query cost:

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [18/28]
❖ Indexes

Indexes provide efficient content-based access to tuples.

Can build indexes on any (combination of) attributes.

Definining indexes:

CREATE INDEX name ON table ( attr1, attr2, ... )

attri  can be an arbitrary expression (e.g. upper(name)).

CREATE INDEX  also allows us to specify

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [19/28]
❖ Indexes (cont)

Indexes can significantly improve query costs.

Considerations in applying indexes:

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [20/28]
❖ Exercise: Indexes and Query Types

Consider a table Students(id,name,age,...)

If we had the following mix of queries

QueryFreq
select * from Students where id = zID 70%
select * from Students where name = Name 10%
select * from Students where name like '%str%' 15%
select * from Students where age > Age 5%

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [21/28]
❖ Expressing SQL Queries

Example: query to find sales people earning more than $50K

select name from Employee
where  salary > 50000 and
       empid in (select empid from Worksin
                 where  dept = 'Sales')

A query evaluator would have to use the strategy

SalesEmps = (select empid from WorksIn where dept='Sales')
foreach e in Employee {
    if (e.empid in SalesEmps && e.salary > 50000)
        add e to result set
}

Needs to examine all employees, even if not in Sales

This is not a good expression of the query.

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [22/28]
❖ Expressing SQL Queries (cont)

A different expression of the same query:

select name
from   Employee join WorksIn using (empid)
where  Employee.salary > 5000 and
       WorksIn.dept = 'Sales'

Query evaluator might use the strategy

SalesEmps = (select * from WorksIn where dept='Sales')
foreach e in (Employee join SalesEmps) {
    if (e.salary > 50000)
        add e to result set
}

Only examines Sales employees, and uses a simpler test

This is a good expression of the query.

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [23/28]
❖ Expressing SQL Queries (cont)

A very poor expression of the query (correlated subquery):

select name from Employee e
where  salary > 50000 and
       'Sales' in (select dept from Worksin where empid=e.id)

A query evaluator would be forced to use the strategy:

foreach e in Employee {
    Depts = (select dept from WorksIn where empid=e.empid)
    if ('Sales' in Depts && e.salary > 50000)
        add e to result set
}

Needs to run a query for every employee ...

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [24/28]
❖ Exercise: Comparing Query Costs

Measure the time costs of the following queries:

select count(*) from Subjects where substring(title,1,1) = 'D';
select count(*) from Subjects where title like 'D%';
select count(*) from Subjects where title ilike 'd%';
select count(*) from Subjects where title ~ '^D.*';
select count(*) from Subjects where title ~* '^d.*';

select count(*) from Subjects where title = 'Database Systems';
select count(*) from Subjects where id = 56571;

select max(id) from Subjects;
select max(title) from Subjects;

select count(*)
from   Subjects s join Courses c on s.id = c.subject
       join Course_enrolments e on c.id = e.course
where  code = 'COMP1511';

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [25/28]
❖ Query Tuning

Sometimes, a query can be re-phrased to affect performance:

Examples which may prevent optimiser from using indexes:

select name from Employee where salary/365 > 100
       -- fix by re-phrasing condition to (salary > 36500)
select name from Employee where name like '%ith%'
select name from Employee where birthday is null
       -- above two are difficult to "fix"
select name from Employee
where  dept in (select id from Dept where ...)
       -- fix by using Employee join Dept on (e.dept=d.id)

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [26/28]
❖ Query Tuning (cont)

Other tricks in query tuning (effectiveness is DBMS-dependent)

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [27/28]
❖ Exercise: More Query Costs

Check whether the optimisations mentioned above actually work

-- names of course convenors
select full_name from People
where  id in (select convenor from Courses)
-- or --
select p.full_name from People p
where exists (select * from Courses where convenor=p.id)
-- or --
select p.full_name
from People p join Courses c on (p.id=c.convenor)

-- people from Colombia(48) or Mongolia(140)
select full_name from People where origin = 48 or origin = 140
-- or --
(select full_name from People where origin = 48)
union
(select full_name from People where origin = 140)
-- or --
select full_name from People where origin in (48,140)

COMP3311 23T3 ♢ Week 9 Monday Lecture ♢ [28/28]


Produced: 6 Nov 2023