Relational Algebra


Relational Algebra

Relational algebra (RA) can be viewed as ...

Relational algebra consists of: RA can be viewed as the "machine language" for RDBMSs

Relational Algebra (cont)

Core relational algebra operations:

Common extensions include:

Relational Algebra (cont)

Select, project, join provide a powerful set of operations for constructing relations and extracting relevant data from them.

[Diagram:Pic/relalg/spj.png]

Adding set operations and renaming makes RA complete.


Notation

Standard treatments of relational algebra use Greek symbols.

We use the following notation (because it is easier to reproduce):

Operation Standard
Notation
Our
Notation
Selection σexpr(Rel) Sel[expr](Rel)
Projection πA,B,C(Rel) Proj[A,B,C](Rel)
Join Rel1 expr Rel2 Rel1  Join[expr]  Rel2
Rename ρschemaRel Rename[schema](Rel)

For other operations (e.g. set operations) we adopt the standard notation.


Notation (cont)

We define the semantics of RA operations using


Notation (cont)

All RA operators return a result relation (no DB updates).

For convenience, we can name a result and use it later.

E.g.

Temp = R op1 S op2 T
Res  = Temp op3 Z
-- which is equivalent to
Res  = (R op1 S op2 T) op3 Z

Each "intermediate result" has a well-defined schema.


Sample Relations

Example database #0 to demonstrate RA operators:

[Diagram:Pic/relalg/db0.png]


Sample Relations (cont)

Example database #1 to demonstrate RA operators:

[Diagram:Pic/relalg/db1.png]


Sample Relations (cont)

Example database #2 to demonstrate RA operators:

[Diagram:Pic/relalg/db2.png]


Selection

Selection returns a subset of the tuples in a relation r that satisfy a specified condition C.

σC(r)   =   Sel[C](r)   =   { t  |  t ∈ r ∧ C(t) },     where r(R)

C is a boolean expression on attributes in R.

Result size:   |σC(r)| |r|

Result schema:   same as the schema of r   (i.e. R)

Programmer's view:

result = {}
for each tuple t in relation r
    if (C(t)) { result = result ∪ {t} }


Selection (cont)

Example selections:

Sel [B = 1] (r1)

  A    B    C    D  
a1x4
e1y4

  Sel [A=b ∨ A=c] (r1)

  A    B    C    D  
b2y5
c4z4

Sel [B ≥ D] (r1)

  A    B    C    D  
c4z4
d8x5

  Sel [A = C] (r1)

  A    B    C    D  


Selection (cont)

Example queries:


Projection

Projection returns a set of tuples containing a subset of the attributes in the original relation.

πX(r)   =   Proj[X](r)   =   { t[X]  |  t ∈ r },     where r(R)

X specifies a subset of the attributes of R.

Note that removing key attributes can produce duplicates.

In RA, duplicates are removed from the result set.
(In many RDBMS's, duplicates are retained   (i.e. they use bag, not set, semantics))

Result size:   |πX(r)| |r|     Result schema:   R'(X)

Programmer's view:

result = {}
for each tuple t in relation r
    result = result ∪ {t[X]}


Projection (cont)

Example projections:

Proj [A,B,D] (r1)

  A    B    D  
a14
b25
c44
d85
e14
f25

    Proj [B,D] (r1)

  B    D  
14
25
44
85

    Proj [D] (r1)

  D  
4
5


Projection (cont)

Example queries:

  • Proj [branchName] (Branch)
  • Which branches actually hold accounts?
  • What are the names and addresses of all customers?
  • Generate a list of all the account numbers

  • Union

    Union combines two compatible relations into a single relation via set union of sets of tuples.

    r1 ∪ r2   =   { t  |  t ∈ r1 ∨ t ∈ r2 },     where r1(R), r2(R)

    Compatibility = both relations have the same schema

    Result size:   |r1 ∪ r2|     |r1| + |r2|     Result schema: R

    Programmer's view:

    result = r1
    for each tuple t in relation r2
        result = result ∪ {t}
    


    Union (cont)

    Example queries:

    The union operator is symmetric i.e.   R ∪ S   =   S ∪ R.

    Intersection

    Intersection combines two compatible relations into a single relation via set intersection of sets of tuples.

    r1 ∩ r2   =   { t  |  t ∈ r1 ∧ t ∈ r2 },     where r1(R), r2(R)

    Uses same notion of relation compatibility as union.

    Result size:   |r1 ∪ r2|     min(|r1|,|r2|)     Result schema: R

    Programmer's view:

    result = {}
    for each tuple t in relation r1
        if (t ∈ r2) { result = result ∪ {t} }
    


    Intersection (cont)

    Example queries:

    The intersection operator is symmetric i.e.   R ∩ S   =   S ∩ R.

    Difference

    Difference finds the set of tuples that exist in one relation but do not occur in a second compatible relation.

    r1 - r2   =   { t  |  t ∈ r1 t ∈ r2 },     where r1(R), r2(R)

    Uses same notion of relation compatibility as union.

    Note: tuples in r2 but not r1 do not appear in the result

    i.e. set difference != complement of set intersection Programmer's view:

    result = {}
    for each tuple t in relation r1
        if (!(t ∈ r2)) { result = result ∪ {t} }
    


    Difference (cont)

    Example difference:

    s1 = Sel [B = 1] (r1)

      A    B    C    D  
    a1x4
    e1y4

        s2 = Sel [C = x] (r1)

      A    B    C    D  
    a1x4
    d8x5

    s1 - s2

      A    B    C    D  
    e1y4

        s2 - s1

      A    B    C    D  
    d8x5

     

    Clearly, difference is not symmetric.


    Difference (cont)

    Example queries:


    Product

    Product (Cartesian product) combines information from two relations pairwise on tuples.

    r × s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s },     where r(R), s(S)

    Each tuple in the result contains all attributes from r and s, possibly with some fields renamed to avoid ambiguity.

    If t1 = (A1...An) and t2 = (B1...Bn) then (t1:t2) = (A1...An,B1...Bn)

    Note: relations do not have to be union-compatible.

    Result size is large:   |r × s| = |r|.|s|     Schema: R∪S

    Programmer's view:

    result = {}
    for each tuple t1 in relation r
        for each tuple t2 in relation s
            result = result ∪ {(t1:t2)} }
    


    Product (cont)

    Example product #1:

    r1 × r2

      A    B    C    D     E    F    G  
    a1x4 1ax
    a1x4 2by
    a1x4 3cx
    a1x4 4ay
    a1x4 5bx
    b2y5 1ax
    b2y5 2by
    b2y5 3cx
    ...
    f2z5 3cx
    f2z5 4ay
    f2z5 5bx


    Product (cont)

    Example product #2:

    Students × Marks

      course    name    sid     stude    subj    mark  
    BEAnne21333 21333101174
    BEAnne21333 21333102170
    BEAnne21333 21333201168
    BEAnne21333 21531101194
    BEAnne21333 21531102190
    BEAnne21333 21623101150
    BEDave21876 21333101174
    BEDave21876 21333102170
    BEDave21876 21333201168
    BEDave21876 21531101194
    BEDave21876 21531102190
    BEDave21876 21623101150
    BScJohn21531 21333101174
    BScJohn21531 21333102170
    BScJohn21531 21333201168
    BScJohn21531 21531101194
    BScJohn21531 21531102190
    BScJohn21531 21623101150
    BScTim21623 21333101174
    BScTim21623 21333102170
    BScTim21623 21333201168
    BScTim21623 21531101194
    BScTim21623 21531102190
    BScTim21623 21623101150


    Product (cont)

    By itself, product isn't a useful querying mechanism.

    However, it gives a basis for combining information across relations.

    A special (and very common) usage of product:

    This usage is represented by a separate operation: join.

    Note: join is not implemented using product   (which is why RDBMSs work)

    Join is critically important in implementing "navigation" in relational databases.


    Product (cont)

    Example query #1:


    Product (cont)

    Example query #2:


    Product (cont)

    Example query #3:


    Rename

    Rename provides "schema mapping".

    If expression E returns a relation R(A1, A2, ... An), then

    Rename[S(B1, B2, ... Bn)](E)

    gives a relation called S

    Rename is like the identity function on the contents of a relation; it changes only the schema.

    Rename can be viewed as a "technical" apparatus of RA.


    Rename (cont)

    Examples:

    Rename[s1(J,K,L,M)](r1)

      J    K    L    M  
    a1x4
    b2y5
    c4z4
    d8x5
    e1y4
    f2z5

    s1     Rename[s2(X,Y,Z)](r2)

      X    Y    Z  
    1ax
    2by
    3cx
    4ay
    5bx

    s2


    Natural Join

    Natural join is a specialised product:

    Consider relation schemas R(ABC..JKLM), S(KLMN..XYZ).

    The natural join of relations r(R) and s(S) is defined as:

    r s   =   r Join s   =  
    { (t1[ABC..J] : t2[K..XYZ])  |  t1 ∈ r ∧ t2 ∈ s ∧ match }

    where    match   =   t1[K] = t2[K] ∧ t1[L] = t2[L] ∧ t1[M] = t2[M]

    Programmer's view:

    result = {}
    for each tuple t1 in relation r
       for each tuple t2 in relation s
          if (matches(t1,t2))
             result = result ∪ {combine(t1,t2)}
    


    Natural Join (cont)

    Natural join can also be defined in terms of other relational algebra operations:

    r Join s   =   Proj[R ∪ S] ( Sel[match] ( r × s) )

    We assume that the union on attributes eliminates duplicates.

    If we wish to join relations, where the common attributes have different names, we rename the attributes first.

    E.g. R(ABC) and S(DEF) can be joined by

    R Join Rename[S(DCF)](S)

    Note: |r s| |r × s|, so join not implemented via product.


    Natural Join (cont)

    Example (assuming that A and F are the common attributes):

    r1 Join r2

      A    B    C    D     E    G  
    a1x4 1x
    a1x4 4y
    b2y5 2y
    b2y5 5x
    c4z4 3x

    Strictly, the operation above was: r1 Join Rename[r2(E,A,G)](r2)


    Natural Join (cont)

    Natural join is not quite the same as:

    Sel[sid=stude] (Students × Marks)

      course    name    sid     stude    subj    mark  
    BEAnne21333 21333101174
    BEAnne21333 21333102170
    BEAnne21333 21333201168
    BScJohn21531 21531101194
    BScJohn21531 21531102190
    BScTim21623 21623101150


    Natural Join (cont)

    Compare this to the previous result:

    Students   Join   Marks

      course    name    sid     subj    mark  
    BEAnne21333 101174
    BEAnne21333 102170
    BEAnne21333 201168
    BScJohn21531 101194
    BScJohn21531 102190
    BScTim21623 101150

    As above, we assume   Rename[Marks(sid,subj,mark)](Marks)

    Natural Join (cont)

    Example queries:


    Theta Join

    The theta join is a specialised product containing only pairs that match on a supplied condition C.

    r C s   =   { (t1 : t2)  |  t1 ∈ r ∧ t2 ∈ s ∧ C(t1 : t2) },
    where r(R),s(S)

    Examples:   (r1 Join[B>E] r2) ... (r1 Join[E<D∧C=G] r2)

    Can be defined in terms of other RA operations:

    r C s   =   r Join[C] s   =   Sel[C] ( r × s )

    Unlike natural join, "duplicate" attributes are not removed.

    Note that   r true s   =   r × s.


    Theta Join (cont)

    Example theta join:

    r1 Join[D < E] r2

      A    B    C    D     E    F    G  
    a1x4 5bx
    c4z4 5bx
    e1y4 5bx

     
    r1 Join[B > 1 ∧ D < E] r2

      A    B    C    D     E    F    G  
    c4z4 5bx

    (Theta join is an important component of most SQL queries; many examples of its use to follow)


    Theta Join (cont)

    Comparison between join operations:

    Equijoin is a specialised theta join; natural join is like theta join followed by projection.

    Outer Join

    r Join s eliminates all s tuples that do not match some r tuple.

    Sometimes, we wish to keep this information, so outer join


    Outer Join (cont)

    Example (assuming that A and F are the common attributes):

    r1 OuterJoin r2

      A    B    C    D     E    G  
    a1x4 1x
    a1x4 4y
    b2y5 2y
    b2y5 5x
    c4z4 3x
    d8x5 nullnull
    e1y4 nullnull
    f2z5 nullnull

    Contrast this to the result for R Join S presented earlier.


    Outer Join (cont)

    Another outer join example (compare to earlier similar join):

    Students   OuterJoin   Marks

      course    name    sid     subj    mark  
    BEAnne21333 101174
    BEAnne21333 102170
    BEDave21876 nullnull
    BEAnne21333 201168
    BScJohn21531 101194
    BScJohn21531 102190
    BScTim21623 101150


    Outer Join (cont)

    There are three variations of outer join R OuterJoin S:

    Which one to use depends on the application e.g.

    If we want to know about all Students, regardless of whether they're enrolled in anything

    Students LeftOuterJoin Enrolment

    Outer Join (cont)

    Operational description of r(R) LeftOuterJoin s(S):

    result = {}
    for each tuple t1 in relation r
       nmatches = 0
       for each tuple t2 in relation s
          if (matches(t1,t2))
             result = result ∪ {combine(t1,t2)}
             nmatches++
       if (nmatches == 0)
          result = result ∪
                     {combine(t1,Snull)}
    

    where Snull is a tuple from S with all atributes set to NULL.


    Division

    Consider two relation schemas R and S where S ⊂ R.

    The division operation is defined on instances r(R), s(S) as:

    r / s   =   r Div s   =   { t | t ∈ r[R-S] ∧ satisfy }

    where   satisfy   =   ∀ ts ∈ S ( ∃ tr ∈ R ( tr[S] = ts ∧ tr[R-S] = t ) )

    Operationally:


    Division (cont)

    Example:

    R=Proj[D,C](r1)

      D    C  
    4x
    4y
    4z
    5x
    5y
    5z

        S=Proj[G](r2)

      G  
    x
    y

        R / S

      name  
    4
    5

    Strictly, R/S is R / Rename[S(C)](S)


    Division (cont)

    Division handles queries that include the notion "for all".

    E.g. Which customers have accounts at all branches?

    We can answer this as follows:


    Division (cont)

    RA operations for answering the query:


    Division (cont)

    Example of answering the query (in a different database instance):

    r2

      name    branchName  
    DavisDowntown
    HayesRound Hill
    JonesBrighton
    JonesDowntown
    JonesRound Hill
    SmithDowntown
    SmithRound Hill

        r3

      branchName  
    Brighton
    Downtown
    Round Hill

        r2/r3

      name  
    Jones


    Division (cont)

    Division can be implemented in terms of other RA operations.

    Consider R/S where R = X ∪ Y and S = Y ...

    T = Proj[X](R) generate all potential result tuples
    U = T × S combine potential results with all tuples from S
    V = U - R find any combined tuples that are not in R;
    any potential results which occur with all S values in R will be removed; V thus contains potential result tuples which do not occur with all S values in R
    W = Proj[X](V) the X parts of these tuples are "disqualified"
    Res = T - W so remove them to produce the result


    Aggregation

    Two types of aggregation are common in database queries:


    Aggregation (cont)

    Example aggregations:

    GroupBy[C](r1)

      A    B    C    D  
    a1x4
    d8x5
    f2x5
     
    b2y5
    e1y4
     
    c4z4

        Sum[B](r1)

      Sum  
    18

        GroupBy[C]Sum[B](r1)

      C    Sum  
    x11
    y3
    z4


    Generalised Projection

    In standard projection, we select values of specified attributes.

    In generalised projection we perform some computation on the attribute value before placing it in the result tuple.

    Examples:


    Produced: 13 Sep 2020