[prev] 99 [next]

Hadoop DFS

Apache Hadoop Distributed File System
  • a hierarchical file system (directories & files a la Linux)
  • designed to run on large number of commodity computing nodes
  • supporting very large files (TB) distributed/replicated over nodes
  • providing high reliability  (failed nodes is the norm)
Provides support for Hadoop map/reduce implementation.

Optimised for write-once-read-many apps

  • simplifies data coherence
  • aim is maximum throughput rather than low latency