Relational algebra (RA) can be viewed as ...
Core relational algebra operations:
Select, project, join provide a powerful set of operations for constructing relations and extracting relevant data from them.
Adding set operations and renaming makes RA complete.
Standard treatments of relational algebra use Greek symbols.
We use the following notation (because it is easier to reproduce):
Standard Notation |
Our Notation |
|||
σexpr(Rel) | Sel[expr](Rel) | |||
πA,B,C(Rel) | Proj[A,B,C](Rel) | |||
Rel1 ![]() |
Rel1 Join[expr] Rel2 | |||
ρschemaRel | Rename[schema](Rel) |
For other operations (e.g. set operations) we adopt the standard notation.
We define the semantics of RA operations using
(tuple relational calculus ... cf. Haskell list comprehensions)
All RA operators return a result relation (no DB updates).
For convenience, we can name a result and use it later.
E.g.
Temp = R op1 S op2 T
Res = Temp op3 Z
-- which is equivalent to
Res = (R op1 S op2 T) op3 Z
Each "intermediate result" has a well-defined schema.
Example database #0 to demonstrate RA operators:
Example database #1 to demonstrate RA operators:
Example database #2 to demonstrate RA operators:
Selection returns a subset of the tuples in a relation r that satisfy a specified condition C.
C is a boolean expression on attributes in R.
Result size: |σC(r)| ≤ |r|
Result schema: same as the schema of r (i.e. R)
Programmer's view:
result = {} for each tuple t in relation r if (C(t)) { result = result ∪ {t} }
Example selections:
Sel [B = 1] (r1)
|
Sel [A=b ∨ A=c] (r1)
|
|||||||||||||||||||||||||
Sel [B ≥ D] (r1)
|
Sel [A = C] (r1)
|
Example queries:
Projection returns a set of tuples containing a subset of the attributes in the original relation.
X specifies a subset of the attributes of R.
Note that removing key attributes can produce duplicates.
In RA, duplicates are removed from the result set.
(In many RDBMS's, duplicates are retained (i.e. they use bag, not set, semantics))
Result size: |πX(r)| ≤ |r| Result schema: R'(X)
Programmer's view:
result = {} for each tuple t in relation r result = result ∪ {t[X]}
Example projections:
Proj [A,B,D] (r1)
|
Proj [B,D] (r1)
|
Proj [D] (r1)
|
Example queries:
Union combines two compatible relations into a single relation via set union of sets of tuples.
Compatibility = both relations have the same schema
Result size: |r1 ∪ r2| ≤ |r1| + |r2| Result schema: R
Programmer's view:
result = r1 for each tuple t in relation r2 result = result ∪ {t}
Example queries:
Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.
Uses same notion of relation compatibility as union.
Result size: |r1 ∪ r2| ≤ min(|r1|,|r2|) Result schema: R
Programmer's view:
result = {} for each tuple t in relation r1 if (t ∈ r2) { result = result ∪ {t} }
Example queries:
Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
Uses same notion of relation compatibility as union.
Note: tuples in r2 but not r1 do not appear in the result
i.e. set difference != complement of set intersection Programmer's view:
result = {} for each tuple t in relation r1 if (!(t ∈ r2)) { result = result ∪ {t} }
Example difference:
s1 = Sel [B = 1] (r1)
|
s2 = Sel [C = x] (r1)
|
|||||||||||||||||||||||||
s1 - s2
|
s2 - s1
|
Clearly, difference is not symmetric.
Example queries:
Product (Cartesian product) combines information from two relations pairwise on tuples.
Each tuple in the result contains all attributes from r and s, possibly with some fields renamed to avoid ambiguity.
If t1 = (A1...An) and t2 = (B1...Bn) then (t1:t2) = (A1...An,B1...Bn)
Note: relations do not have to be union-compatible.
Result size is large: |r × s| = |r|.|s| Schema: R∪S
Programmer's view:
result = {} for each tuple t1 in relation r for each tuple t2 in relation s result = result ∪ {(t1:t2)} }
Example product #1:
r1 × r2 |
|
Example product #2:
course | name | sid | stude | subj | mark |
BE | Anne | 21333 | 21333 | 1011 | 74 |
BE | Anne | 21333 | 21333 | 1021 | 70 |
BE | Anne | 21333 | 21333 | 2011 | 68 |
BE | Anne | 21333 | 21531 | 1011 | 94 |
BE | Anne | 21333 | 21531 | 1021 | 90 |
BE | Anne | 21333 | 21623 | 1011 | 50 |
BE | Dave | 21876 | 21333 | 1011 | 74 |
BE | Dave | 21876 | 21333 | 1021 | 70 |
BE | Dave | 21876 | 21333 | 2011 | 68 |
BE | Dave | 21876 | 21531 | 1011 | 94 |
BE | Dave | 21876 | 21531 | 1021 | 90 |
BE | Dave | 21876 | 21623 | 1011 | 50 |
BSc | John | 21531 | 21333 | 1011 | 74 |
BSc | John | 21531 | 21333 | 1021 | 70 |
BSc | John | 21531 | 21333 | 2011 | 68 |
BSc | John | 21531 | 21531 | 1011 | 94 |
BSc | John | 21531 | 21531 | 1021 | 90 |
BSc | John | 21531 | 21623 | 1011 | 50 |
BSc | Tim | 21623 | 21333 | 1011 | 74 |
BSc | Tim | 21623 | 21333 | 1021 | 70 |
BSc | Tim | 21623 | 21333 | 2011 | 68 |
BSc | Tim | 21623 | 21531 | 1011 | 94 |
BSc | Tim | 21623 | 21531 | 1021 | 90 |
BSc | Tim | 21623 | 21623 | 1011 | 50 |
By itself, product isn't a useful querying mechanism.
However, it gives a basis for combining information across relations.
A special (and very common) usage of product:
Note: join is not implemented using product (which is why RDBMSs work)
Join is critically important in implementing "navigation" in relational databases.
Example query #1:
Example query #2:
Example query #3:
Rename provides "schema mapping".
If expression E returns a relation R(A1, A2, ... An), then
gives a relation called S
Rename can be viewed as a "technical" apparatus of RA.
Examples:
J | K | L | M |
a | 1 | x | 4 |
b | 2 | y | 5 |
c | 4 | z | 4 |
d | 8 | x | 5 |
e | 1 | y | 4 |
f | 2 | z | 5 |
s1
X | Y | Z |
1 | a | x |
2 | b | y |
3 | c | x |
4 | a | y |
5 | b | x |
s2
Natural join is a specialised product:
The natural join of relations r(R) and s(S) is defined as:
where match = t1[K] = t2[K] ∧ t1[L] = t2[L] ∧ t1[M] = t2[M]
Programmer's view:
result = {} for each tuple t1 in relation r for each tuple t2 in relation s if (matches(t1,t2)) result = result ∪ {combine(t1,t2)}
Natural join can also be defined in terms of other relational algebra operations:
We assume that the union on attributes eliminates duplicates.
If we wish to join relations, where the common attributes have different names, we rename the attributes first.
E.g. R(ABC) and S(DEF) can be joined by
Note: |r s|
|r × s|, so join not implemented via product.
Example (assuming that A and F are the common attributes):
r1 Join r2
|
Strictly, the operation above was: r1 Join Rename[r2(E,A,G)](r2)
Natural join is not quite the same as:
course | name | sid | stude | subj | mark |
BE | Anne | 21333 | 21333 | 1011 | 74 |
BE | Anne | 21333 | 21333 | 1021 | 70 |
BE | Anne | 21333 | 21333 | 2011 | 68 |
BSc | John | 21531 | 21531 | 1011 | 94 |
BSc | John | 21531 | 21531 | 1021 | 90 |
BSc | Tim | 21623 | 21623 | 1011 | 50 |
Compare this to the previous result:
course | name | sid | subj | mark |
BE | Anne | 21333 | 1011 | 74 |
BE | Anne | 21333 | 1021 | 70 |
BE | Anne | 21333 | 2011 | 68 |
BSc | John | 21531 | 1011 | 94 |
BSc | John | 21531 | 1021 | 90 |
BSc | Tim | 21623 | 1011 | 50 |
Example queries:
The theta join is a specialised product containing only pairs that match on a supplied condition C.
Examples: (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)
Can be defined in terms of other RA operations:
Unlike natural join, "duplicate" attributes are not removed.
Note that r true s = r × s.
Example theta join:
r1 Join[D < E] r2
| ||||||||||||||||||||||||||||
r1 Join[B > 1 ∧ D < E] r2
|
(Theta join is an important component of most SQL queries; many examples of its use to follow)
Comparison between join operations:
r Join s eliminates all s tuples that do not match some r tuple.
Sometimes, we wish to keep this information, so outer join
Example (assuming that A and F are the common attributes):
r1 OuterJoin r2
|
Contrast this to the result for R Join S presented earlier.
Another outer join example (compare to earlier similar join):
course | name | sid | subj | mark |
BE | Anne | 21333 | 1011 | 74 |
BE | Anne | 21333 | 1021 | 70 |
BE | Dave | 21876 | null | null |
BE | Anne | 21333 | 2011 | 68 |
BSc | John | 21531 | 1011 | 94 |
BSc | John | 21531 | 1021 | 90 |
BSc | Tim | 21623 | 1011 | 50 |
There are three variations of outer join R OuterJoin S:
If we want to know about all Students, regardless of whether they're enrolled in anything
Operational description of r(R) LeftOuterJoin s(S):
result = {} for each tuple t1 in relation r nmatches = 0 for each tuple t2 in relation s if (matches(t1,t2)) result = result ∪ {combine(t1,t2)} nmatches++ if (nmatches == 0) result = result ∪ {combine(t1,Snull)}
where Snull is a tuple from S with all atributes set to NULL.
Consider two relation schemas R and S where S ⊂ R.
The division operation is defined on instances r(R), s(S) as:
where satisfy = ∀ ts ∈ S ( ∃ tr ∈ R ( tr[S] = ts ∧ tr[R-S] = t ) )
Operationally:
Example:
R=Proj[D,C](r1)
|
S=Proj[G](r2)
|
R / S
|
Strictly, R/S is R / Rename[S(C)](S)
Division handles queries that include the notion "for all".
E.g. Which customers have accounts at all branches?
We can answer this as follows:
RA operations for answering the query:
Example of answering the query (in a different database instance):
r2
|
r3
|
r2/r3
|
Division can be implemented in terms of other RA operations.
Consider R/S where R = X ∪ Y and S = Y ...
generate all potential result tuples | ||
combine potential results with all tuples from S | ||
find any combined tuples that are not in R; any potential results which occur with all S values in R will be removed; V thus contains potential result tuples which do not occur with all S values in R |
||
the X parts of these tuples are "disqualified" | ||
so remove them to produce the result |
Two types of aggregation are common in database queries:
Example aggregations:
GroupBy[C](r1)
|
Sum[B](r1)
|
GroupBy[C]Sum[B](r1)
|
In standard projection, we select values of specified attributes.
In generalised projection we perform some computation on the attribute value before placing it in the result tuple.
Examples:
Produced: 13 Sep 2020