COMP9315 23T1 ♢ Database Trends ♢ [0/14]
Core "database" goals:
- deal with very large amounts of data (petabyes, exabytes, ...)
- very-high-level languages (deal with data in uniform ways)
- fast query execution (evaluation too slow ⇒ useless)
At the moment
(and for the last 30 years) RDBMSs dominate ...
- simple/clean data model, backed up by theory
- high-level language for accessing data
- 40 years development work on RDBMS engine technology
RDBMSs work well in domains with uniform, structured data.
COMP9315 23T1 ♢ Database Trends ♢ [1/14]
❖ Future of Database (cont) | |
Limitations/pitfalls of classical RDBMSs:
- NULL is ambiguous: unknown, not applicable, not supplied
- "limited" support for constraints/integrity and rules
- no support for uncertainty (data represents the state-of-the-world)
- data model too simple (e.g. no direct support for complex objects)
- query model too rigid (e.g. no approximate matching)
- continually changing data sources not well-handled
- data must be "molded" to fit a single rigid schema
- database systems must be manually "tuned"
- do not scale well to some data sets (e.g. Google, Telco's)
COMP9315 23T1 ♢ Database Trends ♢ [2/14]
❖ Future of Database (cont) | |
How to overcome (some) RDBMS limitations?
Extend the relational model ...
- add new data types and query ops for new applications
- deal with uncertainty/inaccuracy/approximation in data
Replace the relational model ...
- object-oriented DBMS ... OO programming with persistent objects
- XML DBMS ... all data stored as XML documents, new query model
- noSQL data stores (e.g. (key,value) pairs, json or rdf)
COMP9315 23T1 ♢ Database Trends ♢ [3/14]
❖ Future of Database (cont) | |
How to overcome (some) RDBMS limitations?
Performance ...
- new query algorithms/data-structures for new types of queries
- parallel processing
- DBMSs that "tune" themselves
Scalability ...
- distribute data across (more and more) nodes
- techniques for handling streams of incoming data
COMP9315 23T1 ♢ Database Trends ♢ [4/14]
❖ Future of Database (cont) | |
An overview of the possibilities:
- "classical" RDBMS (e.g. PostgreSQL, Oracle, SQLite)
- parallel DBMS (e.g. XPRS)
- distributed DBMS (e.g. Cohera)
- deductive databases (e.g. Datalog)
- temporal databases (e.g. MariaDB)
- column stores (e.g. Vertica, Druid)
- object-oriented DBMS (e.g. ObjectStore)
- key-value stores (e.g. Redis, DynamoDB)
- wide column stores (e.g. Cassandra, Scylla, HBase)
- graph databases (e.g. Neo4J, Datastax)
- document stores (e.g. MongoDB, Couchbase)
- search engines (e.g. Google, Solr)
COMP9315 23T1 ♢ Database Trends ♢ [5/14]
❖ Future of Database (cont) | |
Historical perspective
COMP9315 23T1 ♢ Database Trends ♢ [6/14]
Some modern applications have massive data sets (e.g. Google)
- far too large to store on a single machine/RDBMS
- query demands far too high even if could store in DBMS
Approach to dealing with such data
- distribute data over large collection of nodes (also, redundancy)
- provide computational mechanisms for distributing computation
Often this data does not need full relational selection
- represent data via (key,value) pairs
- unique keys can be used for addressing data
- values can be large objects (e.g. web pages, images, ...)
COMP9315 23T1 ♢ Database Trends ♢ [7/14]
Popular computational approach to such data: map/reduce
- suitable for widely-distributed, very-large data
- allows parallel computation on such data to be easily specified
- distribute (map) parts of computation across network
- compute in parallel (possibly with further mapping)
- merge (reduce) multiple results for delivery to requestor
Some large data proponents see no future need for SQL/relational ...
- depends on application (e.g. hard integrity vs eventual consistency)
COMP9315 23T1 ♢ Database Trends ♢ [8/14]
DBMSs generally do precise matching (although like/regexps)
Information retrieval systems do approximate matching.
E.g. documents containing a set of keywords (Google, etc.)
Also introduces notion of "quality" of matching
(e.g. tuple T1 is a better match than tuple T2)
Quality also implies ranking of results.
Ongoing research in incorporating IR ideas into DBMS context.
Goal: support database exploration better.
COMP9315 23T1 ♢ Database Trends ♢ [9/14]
Data which does not fit the "tabular model":
- image, video, music, text, ... (and combinations of these)
Research problems:
- how to specify queries on such data?
(image1 ≅ image2)
- how to "display" results?
(synchronize components)
Solutions to the first problem typically:
- extend notions of "matching"/indexes for querying
- require sophisticated methods for capturing data features
Sample query: find other songs
like this one?
COMP9315 23T1 ♢ Database Trends ♢ [10/14]
Multimedia/IR introduces approximate matching.
In some contexts, we have approximate/uncertain data.
E.g. witness statements in a crime-fighting database
"I think the getaway car was red ... or maybe orange ..."
"I am 75% sure that John carried out the crime"
Work by Jennifer Widom at Stanford on the Trio system
- extends the relational model (ULDB)
- extends the query language (TriQL)
COMP9315 23T1 ♢ Database Trends ♢ [11/14]
❖ Stream Data Management Systems | |
Makes one addition to the relational model
- stream = infinite sequence of tuples, arriving one-at-a-time
Applications:
news feeds, telecomms, monitoring web usage, ...
RDBMSs: run a variety of queries on (relatively) fixed data
StreamDBs: run fixed queries on changing data (stream)
One approach: window = "relation" formed from a stream via a rule
E.g. StreamSQL
select avg(price)
from examplestream [size 10 advance 1 tuples]
COMP9315 23T1 ♢ Database Trends ♢ [12/14]
Uses graphs rather than tables as basic data structure tool.
Applications: social networks, ecommerce purchases, interests, ...
Many real-world problems are modelled naturally by graphs
- can be represented in RDBMSs, but not processed efficiently
- e.g. recursive queries on
Nodes, Properties, Edges tables
Graph data models: flexible, "schema-free", inter-linked
Typical modeling formalisms: XML, JSON, RDF
More details later ...
COMP9315 23T1 ♢ Database Trends ♢ [13/14]
Characteristics of dispersed databases:
- very large numbers of small processing nodes
- data is distributed/shared among nodes
Applications:
environmental monitoring devices, "intelligent dust", ...
Research issues:
- query/search strategies
(how to organise query processing)
- distribution of data
(trade-off between centralised and diffused)
Less extreme versions of this already exist:
- grid and cloud computing
- database management for mobile devices
COMP9315 23T1 ♢ Database Trends ♢ [14/14]
Produced: 18 Apr 2023