-
Relational algebra operators can be composed.
What precisely does this mean? And why is it important?
[show answer]
Composition means that the result of one operator can be used as an argument
for some other operator.
Since most relational algebra operators take relations as inputs and produce
relations as results, they can be composed.
It is important because it allows us to build arbitrarily complex relational
algebra expressions to answer arbitrarily complex queries (i.e. we can build
answers to complex queries from answers to simple queries).
Note that we often find it clearer to express complex queries as a
sequence of simple queries which are assigned
to named
intermediate relations, rather than as deeply nested expressions.
-
The natural join ( R Join S ) joins two tables on
their common attributes.
Consider a theta-join ( R Join[C] S ) where
the join condition C is a conjunction of R.A = S.A
for each attribute A appearing in the schemas of both R
and S (i.e. it joins on the common attributes).
What is the difference between the above natural join and theta join?
[show answer]
The relation that results from the natural join has only one attribute
from each pair of matching attributes. The theta-join has attributes for
both, and their columns are identical.
Both joins result in a relation with the same number of tuples.
-
The definition of relational algebra in lectures was based on sets
of tuples. In practice, commercial relational database management systems
deal with bags (or multisets) of tuples.
Consider the following relation that describes a collection of PCs
PCs
| Model | Speed | RAM | Disk | Price |
|---|
| 1001 | 700 | 64 | 10 | 799 |
| 1002 | 1500 | 128 | 60 | 2499 |
| 1003 | 1000 | 128 | 20 | 1499 |
| 1004 | 1000 | 256 | 40 | 1999 |
| 1005 | 700 | 64 | 30 | 999 |
Consider a projection of this relation on the processor speed
attribute, i.e. Proj[speed](PCs).
- What is the value of the projection as a set?
- What is the value of the projection as a bag?
- What is the average speed if the projection is a set?
- What is the average speed if the projection is a bag?
- Is the minimum/maximum speed different between the bag and set representation?
[show answer]
- {700, 1000, 1500}
- {700, 700, 1000, 1000, 1500}
- (700+1000+1500)/3 = 1067
- (700+700+1000+1000+1500)/5 = 980
- No. Unlike the average value, the minimum and maximum values
are not affected by repetitions.
-
Consider the following two relations:
Give the relation that results from each of the following
relational algebra expressions on these relations:
-
R Div S
[show answer]
-
R Div (Sel[B != b1](S))
[show answer]
| Tmp1 = Sel[B != b1](S) | = |
|
| R Div Tmp1 | = |
|
-
R Div (Sel[B != b2](S))
[show answer]
| Tmp1 = Sel[B != b2](S) | = |
|
| R Div Tmp1 | = |
|
-
R × S) - (Sel[R.C=S.C](R Join[B=B] S)
[show answer]
| Tmp1 = R Join[R.B=S.B] S | = |
| R.A | R.B | R.C | S.B | S.C |
| a1 | b1 | c1 | b1 | c1 |
| a1 | b2 | c2 | b2 | c2 |
| a2 | b1 | c1 | b1 | c1 |
|
| Tmp2 = Sel[R.C=S.C](Tmp1) | = |
| R.A | R.B | R.C | S.B | S.C |
| a1 | b1 | c1 | b1 | c1 |
| a1 | b2 | c2 | b2 | c2 |
| a2 | b1 | c1 | b1 | c1 |
|
| Tmp3 = R × S | = |
| R.A | R.B | R.C | S.B | S.C |
| a1 | b1 | c1 | b1 | c1 |
| a1 | b1 | c1 | b2 | c2 |
| a1 | b2 | c2 | b1 | c1 |
| a1 | b2 | c2 | b2 | c2 |
| a2 | b1 | c1 | b1 | c1 |
| a2 | b1 | c1 | b2 | c2 |
|
| Tmp3 - Tmp2 | = |
| R.A | R.B | R.C | S.B | S.C |
| a1 | b1 | c1 | b2 | c2 |
| a1 | b2 | c2 | b1 | c1 |
| a2 | b1 | c1 | b2 | c2 |
|
-
Consider two relations R1 and R2, where R1 contains
N1 tuples and R2 contains N2 tuples, and
N1 > N2 > 0.
Give the minimum and maximum possible sizes (in tuples) for the
result relation produced by each of the following relational algebra
expressions.
In each case state any assumptions about the schemas of R1 and
R2 that are needed to make the expression meaningful.
- R1 Union R2
- R1 Intersect R2
- R1 - R2
- R1 × R2
- Sel[a=5](R1)
- Proj[a](R1)
- R1 Div R2
[show answer]
| Expression |
Assumptions |
Min |
Max |
| R1 Union R2 |
R1 and R2 are union-compatible |
N1 (when R2⊂R1) |
N1+N2 (when R1∩R2=∅) |
| R1 Intersect R2 |
R1 and R2 are union-compatible |
0 (when R1∩R2=∅) |
N2 (when R2⊂R1) |
| R1 - R2 |
R1 and R2 are union-compatible |
N1-N2 (when R2∩R1=R2) |
N1 (when R1∩R2=∅) |
| R1 × R2 |
None |
N1*N2 |
N1*N2 |
| Sel[a=5](R1) |
R1 has an attribute a |
0 (no matches) |
N1 (all match) |
| Proj[a](R1) |
R1 has an attribute a, N1 > 0 |
1 |
N1 |
| R1 Div R2 |
The set of R2's attributes is a subset of R1's attributes |
0 |
floor(N1/N2) |
-
(Date, exercise 6.11)
Let A and B be two arbitrary relations.
In terms of the keys of A and B, state
the keys for each of the following RA expressions.
Assume in each case that A and B meet the requirements
of the operation (e.g. they are union-compatible for the Union and
Intersect operations).
-
Sel[cond](A),
where cond is any condition
[show answer]
Any subset of A inherits all of the candidate keys for A
-
Proj[attrs](A),
where attrs is any set of atributes
[show answer]
If the projection includes any candidate key K of A,
then K is a candidate key for the projection. Otherwise, the
only candidate key is the combination of all projected attributes.
-
A × B
[show answer]
Every combination of a candidate key KA for A and
a candidate key KB for B is a candidate key for
the product.
-
A Union B
[show answer]
Even if two relations are union-compatible, it doesn't mean that
they share the same candidate keys, so in the general case, the
only candidate key for a union is the set of all attributes.
In the special case where A and B are selections
from the same relation, the candidate keys for the union would
the the same as the candidate keys for the underlying relation.
-
A Intersect B
[show answer]
Since A Intersect B is a subset of A,
it has A's key.
-
A - B
[show answer]
Every candidate key for A is a candidate key for the
difference (could be viewed as a special kind of selection
from A).
-
A Div B
[show answer]
Ultimately, division is like a selection followed by projection
on the original table.
Similar considerations would apply as for projection.
-
Consider the following relational schema:
Suppliers(sid, sname, address)
Parts(pid, pname, colour)
Catalog(supplier, part, cost)
Assume that the ids are integers,
that cost is a real number,
that all other attributes are strings,
that the supplier field
is a foreign key containing a supplier id,
and
that the part field is a foreign key
containing a part id.
Write a relational algebra expression to answer each
of the following queries:
-
Find the names of suppliers who supply some red part.
[show answer]
RedPartIds = Proj[pid](Sel[colour='red'](Parts))
RedPartSupplierIds = Rename[sid](Proj[supplier](RedPartIds Join Catalog))
Answer = Proj[sname](RedPartSupplierIds Join Suppliers)
-
Find the sids of suppliers who supply some red or green part.
[show answer]
RedGreenPartIds = Proj[pid](Sel[colour='red' OR colour='green'](Parts))
Answer = Proj[sid](RedGreenPartIds Join Catalog)
-
Find the sids of suppliers who supply some red part or
whose address is 221 Packer Street.
[show answer]
RedPartIds = Rename[RedPartIds(part)](Proj[pid](Sel[colour='red'](Parts)))
RedPartSupplierIds = Rename[RedPartSupplierIds(sid)](Proj[supplier](RedPartIds Join Catalog))
PackerStSupplierIds = Proj[sid](Sel[address='221 Packer Street'](Suppliers))
Answer = RedPartSupplierIds Union PackerStSupplierIds
-
Find the sids of suppliers who supply some red part and some green part.
[show answer]
RedPartIds = Proj[pid](Sel[colour='red'](Parts))
RedPartSupplierIds = Proj[sid](RedPartIds Join Catalog)
GreenPartIds = Proj[pid](Sel[colour='green'](Parts))
GreenPartSupplierIds = Proj[sid](GreenPartIds Join Catalog)
Answer = RedPartSupplierIds Intersect GreenPartSupplierIds
-
Find the sids of suppliers who supply every part.
[show answer]
AllPartIds = Proj[pid](Parts)
PartSuppliers = Proj[sid,pid](Catalog)
Answer = PartSuppliers Div AllPartIds
-
Find the sids of suppliers who supply every red part.
[show answer]
RedPartIds = Proj[pid](Sel[colour='red'](Parts))
PartSuppliers = Proj[sid,pid](Catalog)
Answer = PartSuppliers Div RedPartIds
-
Find the sids of suppliers who supply every red or green part.
[show answer]
RedGreenPartIds = Proj[pid](Sel[colour='red' OR colour='green'](Parts))
Answer = PartSuppliers Div RedGreenPartIds
-
Find the sids of suppliers who supply every red part
or supply every green part.
[show answer]
RedPartIds = Proj[pid](Sel[colour='red'](Parts))
GreenPartIds = Proj[pid](Sel[colour='green'](Parts))
PartSuppliers = Proj[sid,pid](Catalog)
AllRedPartSuppliers = PartSuppliers Div RedPartIds
AllGreenPartSuppliers = PartSuppliers Div GreenPartIds
Answer = AllRedPartSuppliers Union AllGreenPartSuppliers
-
Find the pids of parts that are supplied by at least two
different suppliers.
[show answer]
C1 = Catalog
C2 = Catalog
SupplierPartPairs = Sel[C1.sid!=C2.sid](C1 Join[pid] C2)
Answer = Proj[C1.pid](SupplierPartPairs)
-
Find pairs of sids such that the supplier with the first sid
charges more for some part than the supplier with the second sid.
[show answer]
C1 = Catalog
C2 = Catalog
SupplierPartPairs = Sel[C1.sid!=C2.sid AND C1.cost>C2.cost](C1 Join[pid] C2)
Answer = Proj[C1.sid,C2.sid](SupplierPartPairs)
-
Find the pids of the most expensive part(s) supplied by suppliers
named "Yosemite Sham".
[show answer]
R1 = Proj[sid,pid,cost](Sel[name='Yosemite Sham'](Suppliers Join Catalog))
R2 = R1
R3 = Rename[1->sid,2->pid,3->cost](Sel[R2.cost<R1.cost](R1 × R2))
Answer = Proj[pid](R2 Minus Proj[sid,pid,cost](R3))
-
Find the pids of parts supplied by every supplier at a price less
than 200 dollars (if any supplier either does not supply the part or charges
more than 200 dollars for it, the part should not be selected).
[show answer]
CheapParts = Proj[part,supplier](Sel[cost<200](Catalog))
AllSuppliers = Rename[AllSuppliers(supplier)](Proj[sid](Suppliers))
Answer = Proj[part,supplier](CheapParts Div AllSuppliers)
-
Using the Suppliers-Parts-Catalog schema from the previous question,
state what the following relational algebra expressions compute:
-
Proj[sname](
Proj[sid](Sel[colour='red'](Parts))
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
[show answer]
Finds the names of all suppliers who supply red parts that cost less than $100.
-
Proj[sname](
Proj[sid](
Sel[colour='red'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
)
[show answer]
Produces nothing, because there is no sname field left to job
after the Proj[sid].
-
Proj[sname](
Sel[colour='red'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
Intersect
Proj[sname](
Sel[colour='green'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
[show answer]
Finds the names of all suppliers who supply both red and green parts
that cost less than $100.
-
Proj[sid](
Sel[colour='red'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
Intersect
Proj[sid](
Sel[colour='green'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
[show answer]
Finds the sids of all suppliers who supply both red and green parts
that cost less than $100.
-
Proj[sid,sname](
Proj[sid,sname](
Sel[colour='red'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
Intersect
Proj[sid,sname](
Sel[colour='green'](Parts)
Join
Sel[cost<100](Catalog)
Join
Suppliers
)
)
[show answer]
Finds the names and sids of all suppliers who supply
both red and green parts that cost less than $100.
The outermost projection is completely redundant.
-
Consider the following relational schema containing flight, aircraft
and pilot information for an airline:
Flights(flno, from, to, distance, departs, arrives)
Aircraft(aid, aname, cruisingRange)
Certified(employee, aircraft)
Employees(eid, ename, salary)
The Flights relation describes a particular timetabled
fight from a source (city) to a destination (city) at a particular time
each week.
Note that this schema doesn't model which specific aircraft makes
the flight.
The Aircraft relation describes individual planes,
with the aname field containing values like
'Boeing 747', 'Airbus A300', etc.
The Certified relation indicates which pilots (who
are employees) are qualified to fly which aircraft.
The employee field contains the eid of the
pilot, while the aircraft field contains
Finally, the Employees relation contains information
about all of the employees of the airline, including the pilots.
Write, where possible, a relational algebra expression
to answer each of the following queries.
If it is not possible to express any query in relational
algebra, suggest what extra mechanisms might make it possible.
-
Find the ids of pilots certified for 'Boeing 747' aircraft.
[show answer]
Approach: collect together information about aicraft and certification
and project out just the employee ids for tuples that involve 747's.
Answer = Proj[eid](Sel[aname='Boeing 747'](Aircraft Join Certified))
-
Find the names of pilots certified for 'Boeing 747' aircraft.
[show answer]
Approach: collect together information about pilots and what planes they
are certified on and project out just the name (since we need the pilot's
name this time, we need to include Employees).
Answer = Proj[ename](Sel[aname='Boeing 747'](Aircraft Join Certified Join Employees))
-
Find the ids of all aircraft that can be used on non-stop
flights from New York to Los Angeles.
[show answer]
Approach: get all of the flights from New York to Los Angeles (to get the
distance), join these with aircraft information and select only those
combinations where the plane is capable of cruising that far.
If you were fussy, you could also include flights from Los Angeles
to New York with a disjunction in the inner Sel (in case
the distance was different for some reason e.g. slightly different
route).
Answer =
Proj[aid](
Aircraft
Join[cruisingRange>distance]
Sel[from='New York' AND to='Los Angeles'](Flights)
)
-
Identify the flights that can be piloted by a pilot whose salary
is more than $100,000.
[show answer]
Approach: collect together information about which planes can make
which flights and pilot certification, then select just those for
which the pilot is certified and where the pilot earns more
than $100000.
Temp =
Proj[flno,aid](
Flights
Join[distance<cruisingRange]
Aircraft
)
Answer =
Proj[flno](
Sel[salary>100000](
Temp Join Certified Join Employees
)
)
-
Find the names of pilots who can operate planes with a range greater
than 3000 miles, but are not certified on 'Boeing 747' aircraft.
[show answer]
Approach: get a list of employee ids who are certified for aircraft
with a crusing range more than 3000 miles, then subtract the employee
ids for those who are certified on a 747, leaving just those
who are not certified on a 747; to get their names, join
with the employees relation and project out the names.
LongRangeAircraftCert = Proj[eid](Sel[cruisingRange>3000](Aircraft Join Certfied))
Boeing747Cert = Proj[eid](Sel[aname='Boeing 747'](Aircraft Join Certfied))
Answer =
Proj[ename](
Employees Join (LongRangeAircraftCert Minus Boeing747Cert)
)
-
Find the total amount paid to employees as salaries.
[show answer]
Answer = Sum[salary](Employees)
-
Find the ids of employees who make the highest salary.
[show answer]
Approach: find employees who do not have the highest salary (via the join),
and then subtract them from the set of all employees, thus leaving just the
highest paid employees.
E1 = Employees
E2 = Employees
Employees = Proj[eid](Employees)
LowerPaidEmployees = Proj[E2.eid](E1 Join[E1.salary>E2.salary] E2)
Answer = Employees Minus LowerPaidEmployees
-
Find the ids of employees who make the second highest salary.
[show answer]
Approach: use the idea from the previous question; first find highest
paid employees, then remove them from list of employees, then find
highest paid employees in what's left; these will be the second
highest-paid employees.
E1 = Employees
E2 = Employees
HighestPaid =
Proj[eid](Employees)
Minus
Proj[E2.eid](E1 Join[E1.salary>E2.salary] E2)
NotHighestPaid =
Proj[eid](Employees) Minus HighestPaid
E3 = NotHighestPaid
E4 = NotHighestPaid
SecondHighestPaid =
NotHighestPaid
Minus
Proj[E4.eid](E3 Join[E3.salary>E4.salary] E4)
-
Find the ids of employees who are certified for the largest
number of aircraft.
[show answer]
Approach: Group by employee and count their certifications; then find max certifications; then select employees who have that many certifications
R1 = GroupBy[employee,Count[employee]](Certified)
R2 = Rename[1->employee,2->ncertified](R1)
R3 = Max[ncertified](R2)
Answer = Sel[ncertified=R3](R1)
-
Find the ids of employees who are certified for exactly three
aircraft.
[show answer]
Approach: find pilots who are certified for at least three aircraft;
then find pilots who are certified for at least four aircraft; then
subtract the first set from the latter and you'll pilots who are
certified for exactly three aircraft; the "counting" is built into
the join condition.
R1 = R2 = R3 = R4 = Certified
CertifiedForAtLeast3 =
Proj[eid](
Sel[R1.eid=R2.eid=R3.eid AND R1.aid!=R2.aid!=R3.aid](
R1 × R2 × R3
)
)
CertifiedForAtLeast4 =
Proj[eid](
Sel[R1.eid=R2.eid=R3.eid=R4.eid AND R1.aid!=R2.aid!=R3.aid!=R4.aid](
R1 × R2 × R3 × R4
)
)
Answer = CertifiedForAtLeast3 Minus CertifiedForAtLeast4
The above is a particularly inefficient (and obscure) solution.
A better approach would be to solve it in a manner
similar to the previous question.
R1 = GroupBy[employee,Count[employee]](Certified)
R2 = Rename[1->employee,2->ncertified](R1)
Answer = Proj[employee](Sel[ncertified=3](R2))
-
Find if there is any non-stop or single-stop way to travel by air
from Sydney to New York.
[show answer]
Approach: find any non-stop flights from Sydney to New York; find any
single-stop flights from Sydney to New York; take the union of these
two sets.
F1 = F2 = Flights
SydToNYNonStop = Proj[flno](Sel[from='Sydney' AND to='New York'](Flights))
SydToNYSingleStop = Proj[flno](Sel[F1.from='Sydney' AND F2.to='New York'](F1 Join[F1.to=F2.from] F2))
Answer = SydToNYNonStop Union SydToNYSingleStop
-
Is there a sequence of flights from Sydney to Timbuktu (somewhere
in the middle of Africa)? Each flight in the sequence is required
to depart from the destination of the previous flight; the first
flight must leave Sydney, and the last flight must arrive in
Timbuktu, but there is no restriction on the number of intermediate
flights.
[show answer]
This is a generalisation of the previous question to routes with an
arbitrary number of stops. This cannot be expressed in standard
relational algebra. It requires some kind of iterative or recursive
querying mechanism before it will work. If we're willing to put an
upper bound on the number of intermediate stops, we can generalise
the previous query to do it.
-
[based on Ramakrishnan, ex.17.1]
Give a brief definition for each of the following terms:
- transaction
- serializable schedule
- conflict-serializable schedule
- view-serializable schedule
[show answer]
- transaction
An execution of a user program that performs some action that is treated
as atomic according to the semantics of some database application. The
DBMS sees the transaction as a sequence of actions that can include read
and write operations on the database, as well as computations.
- serializable schedule
A schedule over a set of transactions that produces a result that is
the same as some serial execution of the transactions.
- conflict-serializable schedule
A schedule is conflict-serializable if it is conflict-equivalent to
some serial schedule.
Two schedules are conflict-equivalent if they involve the same set
of actions and they order every pair of conflicting actions in the
same way.
- view-serializable schedule
A schedule is view-serializable if it is view-equivalent to some
serial schedule.
Two schedules are view-equivalent if they satisfy:
- the initial value of any object is read by the same
transaction in both schedules, and
- the final value of any object is written by the same
transaction in both schedules, and
- any shared object is written-then-read by the same pair of
transactions in both schedules.
-
Draw the precedence graph for the following schedule (where C means "commit"):
T1: R(A) W(Z) C
T2: R(B) W(Y) C
T3: W(A) W(B) C
[show answer]
It has an edge from T3 to T1 (because of A) and an edge from
T2 to T3 because of B.
This gives: T2 --> T3 --> T1
-
[based on Ramakrishnan, ex.17.2]
Consider the following incomplete schedule S:
T1: R(X) R(Y) W(X) W(X)
T2: R(Y) R(Y)
T3: W(Y)
-
Determine (by using a precedence graph) whether the schedule is conflict-serializable
-
Modify S to create a complete schedule that is conflict-serializable
[show answer]
-
Determine (by using a precedence graph) whether the schedule is serializable
The precedence graph has an edge, from T1 to T3,
because of the conflict between T1:R(Y) and T3:W(Y).
It also has an edge, from T2 to T3,
because of the conflict between the first T2:R(Y) and
T3:W(Y).
It also has an edge, from T3 to T2,
because of the conflict between T3:W(Y) and the
second T2:R(Y). There is a cycle, so it is not conflict-serializable.
-
Modify S to create a complete schedule that is conflict-serializable
Trick question. It is not possible. Since the precedence graph is cyclic,
we know that it's not conflict-serializable.
If we are allowed to add abort actions (which was not mentioned
in the question), we could simply abort either T2 or
T3 and the schedule would become conflict-serializable.
-
[based on Ramakrishnan, ex.17.3]
For each of the following schedules, state whether it is
conflict-serializable and/or view-serializable.
If you cannot decide whether a schedule belongs to either
class, explain briefly.
The actions are listed in the order they are scheduled,
and prefixed with the transaction name.
-
T1:R(X) T2:R(X) T1:W(X) T2:W(X)
-
T1:W(X) T2:R(Y) T1:R(Y) T2:R(X)
-
T1:R(X) T2:R(Y) T3:W(X) T2:R(X) T1:R(Y)
-
T1:R(X) T1:R(Y) T1:W(X) T2:R(Y) T3:W(Y) T1:W(X) T2:R(Y)
-
T1:R(X) T2:W(X) T1:W(X) T3:W(X)
[show answer]
The methods used to determine these solutions:
- for conflict-serializablility, draw precedence graph and look for cycles
- for view-serializablility, apply the definition from lecture notes.
You can short-circuit the view serializability check. As soon as you
know that the schedule is conflict-serializable, it must also be view
serializable.
Solutions:
-
not conflict-serializable, not view-serializable
-
conflict-serializable, view-serializable
-
conflict-serializable, view-serializable
-
not conflict-serializable, not view-serializable
-
not conflict-serializable, view-serializable (view equivalent to the serial
schedule T1, T2, T3)
-
Is the following schedule serializable? Show your working.
T1: R(X)W(X)W(Z) R(Y)W(Y)
T2: R(Y)W(Y)R(Y) W(Y)R(X) W(X)R(V)W(V)
[show answer]
When we talk about serializability and don't specifically say what
kind, we usually mean conflict-serializable.
As above, the "working" for this question involves constructing a
precedence graph, based on conflicting operations, and looking
for cycles.
In this case there's a conflict between T1:R(X) and
T2:W(X), giving a graph edge from T1 to T2.
There's also a conflict between T2:R(Y) and T1:W(Y),
giving a graph edge from T2 to T1.
This means the graph has a cycle, so the schedule is not serializable.