Week 10 Lectures (41)

41

Large Data

Some modern applications have massive data sets (e.g. Google)

far too large to store on a single machine/RDBMS
query demands far too high even if could store in DBMS

Approach to dealing with such data

distribute data over large collection of nodes (also, redundancy)
provide computational mechanisms for distributing computation

Often this data does not need full relational selection

represent data via (key,value) pairs
unique keys can be used for addressing data
values can be large objects (e.g. web pages, images, ...)