[an error occurred while processing this directive]
School of Computer Science & Engineering
University of New South Wales
Advanced Operating Systems
COMP9242 2002/S2
Next: Limitations of Traditional File
Up: 09-fs
Previous: SGI's XFS
Subsections
Designed as a replacement of DOS FAT and OS/2 HPFS[Cus94].
- Requirements:
- large files and devices (
2GB),
- protection,
- reliability and recoverability.
- Features:
- transaction semantics on metadata updates,
- support for mirroring and striping,
- Unicode file names,
- secondary indices (not implemented),
- extensible attribute set, incl.
- unified access to all attributes (incl. data),
- POSIX features: case-sensitive names, creation time stamp,
hard links.
- Many others every non-PC user expects from a file system.
The master file table (MFT).
- ``Relational database'' of file metadata:
- transaction semantics,
- secondary indices for accessing files (only implemented on names),
- indices are B
trees,
- extensible on a per-record basis.
- One entry per file.
- Everything on disk is a file, including all metadata.
- Fixed length record (1-4kB) of variable-length attributes, incl.:
- all names of each file (hard links, DOS name),
- ACL
- data (for small files ore directories).
- Entries are mostly one record, but can span several.
- Each attribute is represented by header and value.
- Value can be:
- resident, i.e., in-line, or
- non-resident, i.e., in separate file.
- Any value can be resident.
- Any value that can grow can be non-resident, incl.:
- data,
- file index (for directory),
- ACL,
- attribute list.
- Non-resident values
- represented as array of:
(logical block #, physical block #, # blocks).
- Supports sparse files (if compression attribute is on).
- Also supports compression of data in extents.
- Directories are indices of file names, containing:
- file number (= MFT record number),
- time stamps (duplicate of MFT value),
- size (duplicate of MFT value).
- All file names are also contained in MFT.
- Indices are B
trees, represented as:
- root node of names (key values),
- corresponding child node (extent) pointers,
- bitmap indicating which blocks in child extents are in use.
- All metadata is in files, described by the first 16 records of the
MFT:
- MFT,
- partial copy of MFT (for fault tolerance),
- mirrors entries for metadata files,
- is allocated in the middle of the disk;
- log file,
- root directory,
- boot file (at known address, contains address of MFT),
- free block bitmap,
- bad block file,
- volume file (volume label),
- ...
- Uses write-ahead logging for metadata updates.
- Log contains redo and undo info.
- Log entries are idempotent.
- Log records may contain data or operations.
- Logging operation:
- Logs transaction sequentially in cache.
- Flushes log file.
- Updates metadata in cache.
- Writes commit record to log file.
- Occasionally writes checkpoint records to log:
- Location recorded in (shadow-paged) log header.
- Indicates starting point for recovery.
- When log full:
- stalls metadata update,
- flushes buffers,
- restarts log.
Cluster |
Block |
Volume |
Partition |
Run |
Extent |
Virtual cluster number |
Logical block number |
Logical cluster number |
Physical block number |
Next: Limitations of Traditional File
Up: 09-fs
Previous: SGI's XFS
Gernot Heiser
2002-10-11
[an error occurred while processing this directive]