COMP9315 Week 10 Thursday Lecture

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [0/60]
❖ Things To Note

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [1/60]
❖ Distributed Databases

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [2/60]
❖ Distributed Databases

A distributed database (DDB) is

A distributed database management system (DDBMS) is A DDBMS may involve
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [3/60]
❖ Distributed Databases (cont)

User view:

[Diagram:Pics/parallel/ddb-user-view.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [4/60]
❖ Distributed Databases (cont)

Architecture:

[Diagram:Pics/parallel/ddb-arch.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [5/60]
❖ Distributed Databases (cont)

Two kinds of distributed databases

The latter are also called federated databases

Ultimately, the distributed database (DDB) provides

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [6/60]
❖ Distributed Databases (cont)

Advantages of distributed databases


Disadvatanges of distributed databases
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [7/60]
❖ Data Distribution

Partitioning/distributing data

Problem: maintaining consistency
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [8/60]
❖ Data Distribution (cont)

Consider table R  horizontally partitioned into R1, R2, ..., Rn

Fragmentation can be done in multiple ways, but need to ensure ...

Completeness

Reconstruction Disjoint
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [9/60]
❖ Query Processing

Query processing typically involves shipping  data

Aim: minimise shipping cost   (since it is a networking cost)

Shipping cost becomes the "disk access cost" of DQOpt

Can still use cost-based query optimisation

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [10/60]
❖ Query Processing (cont)

Distributed query processing

Query optimisation in such contexts is complex ...
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [11/60]
❖ Transaction Processing

Distribution of data complicates tx processing ...

Distributed tx processing handled by two-phase commit
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [12/60]
❖ Non-classical DBMSs

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [13/60]
❖ Classical DBMSs

Assumptions made in conventional DBMSs:

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [14/60]
❖ Modern DBMSs

Aspects of modern applications:

Clearly, not all of these are relevant for every modern application.
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [15/60]
❖ Modern DBMSs (cont)

Some conclusions:

Some "modernists" claim that
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [16/60]
❖ Modern DBMSs (cont)

Some criticisms of the NoSQL approach:

[Diagram:Pics/future/xtra.png]

Uses MySQL as the example relational database system ... think PostgreSQL

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [17/60]
❖ Scale, Distribution, Replication

Data for modern applications is very large (TB, PB, XB)

Some systems opt for massive networks of simple nodes Benefits:
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [18/60]
❖ Schema-free Data Models

Many new DBMSs provide (key,value)  stores

Tables can be simulated by a collection of "similar" objects.
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [19/60]
❖ Eventual Consistency

RDBMSs use a strong transactional/consistency model

Many new DBMSs applications do not need strong consistency Because of distribution/replication
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [20/60]
❖ Eventual Consistency (cont)

If different nodes have different versions of data

Levels of consistency (from Cassandra system)
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [21/60]
❖ Two Case Studies

Consider two variations on the DBMS theme ...

Column Stores

Graph Databases
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [22/60]
❖ Column Stores


(Based on material by Daniel Abadi et al.)
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [23/60]
❖ Column Stores

Column-oriented Databases (CoDbs):

Ideas for CoDbs have been around since the 1970's

Rose to prominence via Daniel Abadi's PhD thesis (MIT, 2008)

Commercial systems have now been developed (e.g. Vertica)

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [24/60]
❖ Column Stores (cont)

File structures for row-store vs column-store:

[Diagram:Pics/future/row-vs-col.png]


Values in individual columns are related by extra tuple id (cf. oid)

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [25/60]
❖ Column Stores (cont)

Stored representation of logical (relational) tables

Example:  Enrolment(course,student,term,mark,grade)
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [26/60]
❖ Rows vs Columns

Workload for different operations

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [27/60]
❖ Rows vs Columns (cont)

Which is more efficient depends on mix of queries/updates

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [28/60]
❖ Rows vs Columns (cont)

Storing sorted columns leads to

Only one column in each projection will be sorted
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [29/60]
❖ Query Evaluation in CoDbs

Projection is easy if one slice contains all required attributes.

If not ...

Example: select a,b,c from R(a,b,c,d,e)

Assume: each column contains N values
for i in 0 .. N-1 {
   x = a[i]   // i'th value in slice containing a
   y = b[i]   // i'th value in slice containing b
   z = c[i]   // i'th value in slice containing c
   add (x,y,z) to Results
}

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [30/60]
❖ Query Evaluation in CoDbs (cont)

If slices are sorted differently, more complicated

Example: select a,b,c from R(a,b,c,d,e)

Assume: each column contains N values
for tid in 0 .. N-1 {
   x = fetch(a,tid)  // entry with tid in slice containing a
   y = fetch(b,tid)  // entry with tid in slice containing b
   z = fetch(c,tid)  // entry with tid in slice containing c
   add (x,y,z) to Results
}

Potentially slow, depending on how fetch() works.

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [31/60]
❖ Query Evaluation in CoDbs (cont)

For remaining discussion, assume


[Diagram:Pics/future/row-vs-col1.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [32/60]
❖ Query Evaluation in CoDbs (cont)

Consider typical multi-attribute SQL query

select a,b,c from R where b > 10 and d < 7

Query operation on individual column is done in one slice

Mark index of each matching entry in a bit-vector

Combine (AND) bit-vectors to get indexes for result entries

For each index, merge result entry columns into result tuple

Known as late materialization.

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [33/60]
❖ Query Evaluation in CoDbs (cont)

Example: select a,b,c from R where b = 5

// Assume: each column contains N values
matches = all-zero bit-string of length N
for i in 0 .. N-1 {
   x = b[i]  // i'th value in b column
   if (x == 5)
       matches[i] = 1  // set bit i in matches
}
for i in 0 .. N-1 {
   if (matches[i] == 0) continue
   add (a[i], b[i], c[i]) to Results
}

Fast sequential scanning of small (compressed?) data

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [34/60]
❖ Query Evaluation in CoDbs (cont)

Example: select a,b,c from R where b>10 and d<7


// Assume: each column contains N values
matches1 = all-zero bit-string of length N
matches2 = all-zero bit-string of length N
for i in 0 .. N-1 {
   if (b[i] > 10) matches1[i] = 1
   if (d[i] < 7) matches2[i] = 1
}
matches = matches1 AND matches2
for i in 0 .. N-1 {
   if (matches[i] == 0) continue
   add (a[i], b[i], c[i]) to Results
}

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [35/60]
❖ Query Evaluation in CoDbs (cont)

Join on columns, set up for late materialization

[Diagram:Pics/future/join.png]


Note: the left result column is always sorted

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [36/60]
❖ Query Evaluation in CoDbs (cont)

Example: select R.a, S.b
                from   R join S on R.a = S.x

// Assume: N tuples in R, M tuples in S
for i in 0 .. N-1 {
   for j in 0 .. M-1 {
      if (a[i] == x[j])
         append (i,j) to IndexList
   }
}
for each (i,j) in IndexList {
   add (aR[i], bS[j]) to Results
}

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [37/60]
❖ Query Evaluation in CoDbs (cont)

Aggregation generally involves a single column

E.g.

select avg(mark), count(student) from Enrolments


Operations involving groups of columns

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [38/60]
❖ Graph Databases


(Based on material by Markus Krotzsch, Renzo Angles, Claudio Gutierrez)
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [39/60]
❖ Graph Databases

Graph Databases (GDbs):


But what kind of "graphs"?
Two major GDb data models:   RDF,   Property Graph
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [40/60]
❖ Graph Databases (cont)

Typical graph modelled by a GDb

[Diagram:Pics/future/people-graph.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [41/60]
❖ Graph Data Models

RDF  =  Resource Description Framework

Data as triples, e.g. <Person1,given,"John">,   <Person1,parent,Person3>

RDF is a W3C standard;  supported in many prog. languages

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [42/60]
❖ Graph Data Models (cont)

RDF model of part of earlier graph:

[Diagram:Pics/future/people-graph-rdf.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [43/60]
❖ Graph Data Models (cont)

Property Graph


Not a standard like RDF, so variations exist
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [44/60]
❖ Graph Data Models (cont)

Property Graph model of part of earlier graph:

[Diagram:Pics/future/people-graph-prop.png]

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [45/60]
❖ GDb Queries

Graph data models require a graph-oriented query framework

Types of queries in GDbs

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [46/60]
❖ GDb Queries (cont)

Graphs contain arbitrary-length paths

Need an expression mechanism for describing such paths

GDb query languages:
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [47/60]
❖ Example Graph Queries

Example: Persons whose first name is James

SPARQL:

PREFIX p: <http://www.people.org>
SELECT ?X
WHERE { ?X p:given "James" }

Cypher:

MATCH (person:Person)
WHERE person.given="James"
RETURN person

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [48/60]
❖ Example Graph Queries (cont)

Example: All of James' ancestors

SPARQL:

PREFIX p: <http://www.socialnetwork.org/>
SELECT ?Y
WHERE { ?X p:type p:Person . ?X p:given "James" .
        ?Y p:parent* ?X }

Cypher:

MATCH (ancestor:Person)-[:parent*]->(james:Person)
WHERE james.given="James"
RETURN DISTINCT ancestor

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [49/60]
❖ Course Review + Exam

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [50/60]
❖ Syllabus

View of DBMS internals from the bottom-up:

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [51/60]
❖ Exam

Thursday 9 May, morning and afternoon sessions

Held in CSE Labs (allocations posted in Week 11).

All answers are typed and submitted on-line.

Environment is similar to Vlab.

Before the exam, learn to use ...

And, yes, you have access to VScode.
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [52/60]
❖ Exam (cont)

Resources available during exam:

No access to any other material is allowed.

No network access is available (e.g. no web, no email, ...)

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [53/60]
❖ Exam (cont)

Tools available during the exam

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [54/60]
❖ What's on the Exam?

Potential topics to be examined ...

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [55/60]
❖ What's on the Exam? (cont)

Questions will have the following "flavours" ...

There will be no SQL/PLpgSQL code writing.

You will not have to modify PostgreSQL during the exam.

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [56/60]
❖ Exam Structure

There will be 8 questions

Reminder:
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [57/60]
❖ Special Consideration

Reminder: this is a one-chance exam.

If you're sick, get documentation and do not attend the exam.

Special consideration requests must clearly show

Other factors are not relevant   (e.g. "I can't afford to repeat")
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [58/60]
❖ Revision

Things you can use for revision:

Pre-exam consultations leading up to exam  (see course web site)
COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [59/60]
❖ And that's all folks ...


End of COMP9315 19T2 Lectures.


Good luck with the exam ...


And keep on using PostgreSQL ...

COMP9315 24T1 ♢ Week 10 Thursday Lecture ♢ [60/60]


Produced: 18 Apr 2024