next up previous
Next: Supporting RAID Up: Who wants another filesystem? Previous: A primer on Log

Supporting volume management

I should come right out and say it: I am no great fan of Logical Volume Management, or LVM.

It is not that I don't think there is a need for it: there clearly is. People want flexibility in arranging how their storage space is used. It's not that I think LVM shouldn't be allowed: clearly is should as it allows lots of people to do things that they otherwise couldn't. However I still don't like it. It isn't the right solution.

A typical logical volume manager takes one or more large devices and divides them up into chunks. It then assembles some of those chunks together into one or more 'volumes' or virtual devices.

A typical use of a volume is to store a filesystem in it. The filesystem sees the large contiguous piece of storage, and divides it up into little pieces, and then re-assembles those pieces into one or more files, which are in many ways similar to volumes or devices.

This double layering seems inappropriate. Why have two layers that present abstractions of variable sized continuous storage from fixed size storage objects, when only one will do.

LaFS addresses this issue by allowing a multitude of devices to be provided as storage for the filesystem data. It also allows for a multitude of filesystems to be stored in those devices, though usually one filesystem with a multitude of directories will be enough.

This is not a unique idea: the Advanced File System in Tru64 from HP/Compaq/Digital contains similar ideas. However it is not a widely used idea, at least in Linux.

Further, this idea does not particularly require log structuring to make it work. However some aspects of a log structured filesystem to work quite well with the multiple device concept.

With a traditional LVM, it is quite easy to make a volume grow larger, and then tell the filesystem in the volume to use the extra space (if the FS knows how to do that). However shrinking is not quite as easy. Making a filesystem shrink will invariably require relocating some data which many filesystems do not support.

However a LFS has a cleaner which is continuously relocating data. If we wish an LFS to stop using some component device, we can simply adjust the cleaning heuristic to prefer to clean segments from that device, even if they are full. If we also inhibit clean segments from that device from being allocated, then the device will eventually (depending on how fast we push the cleaner) become completely unused and so can easily be removed from the filesystem.

Another piece of functionality provided by some Logical Volume Managers is creating a snapshot. When this is enabled, write requests are trap and if the data to be over-written is from before the snapshot, it is relocated first. Thus an image of the filesystem can be kept stable (e.g. for backups) while other changes are still happening.

Again, this functionality can better be provided by a filesystem than by introducing an intermediate layer. When the filesystem is asked to take a snapshot, it can simply choose not to over-write any data that was live at the time of the snapshot, so no data relocation is needed. Lafs provides this functionality quite naturally.

Lafs goes a little way beyond just making an underlying LVM irrelevant. It also provides some LVM functionality itself. Lafs can mark some segments to be non-logged so that they do not take part in the normal logging and cleaning process. A file can then be created so that all the data blocks of this file (but not the inode/index blocks) reside in these non-logged segments.

Such a file will not benefit from the various data-protection features of an LFS, but also will not suffer terrible fragmentation in the face of lots of random updates. Such a file would have limited uses, but would be very valuable in those limited cases. They include:

Another offshoot of having lafs know about multiple devices is that it can have a concept of devices with different performance. For example, lafs can have an NVRAM device which is of limited size but has very low write latency, and a RAID5 array which is much larger, but slower.

Any new data would be written to the NVRAM device using fairly small segment sizes, and the oldest segments on the NVRAM device would be cleaned off onto the RAID5 array as needed to make more room. This would cause the NVRAM to effectively work as a low-latency write cache in front of the RAID5 array.

In general, having the filesystem talk directly to the device rather than through an intermediate LVM layer provides more opportunities for the filesystem to make informed decisions on how to use the device.


next up previous
Next: Supporting RAID Up: Who wants another filesystem? Previous: A primer on Log
Neil Brown 2003-02-06