❖ PostgreSQL File Manager (cont) |
Components of storage subsystem:
RelFileNodestorage/smgrstorage/smgr/md.cstorage/filesmgr❖ Relations as Files |
PostgreSQL identifies relation files via their OIDs.
The core data structure for this is RelFileNode
// include/storage/relfilenode.h typedef struct RelFileNode { Oid spcNode; // tablespace Oid dbNode; // database Oid relNode; // relation } RelFileNode;
Global (shared) tables (e.g. pg_database
spcNode == GLOBALTABLESPACE_OIDdbNode == 0❖ Relations as Files (cont) |
The relpathRelFileNode
// include/common/relpath.h // common/relpath.c char *relpath(RelFileNode r) // simplified { char *path = malloc(ENOUGH_SPACE); if (r.spcNode == GLOBALTABLESPACE_OID) { /* Shared system relations live in PGDATA/global */ Assert(r.dbNode == 0); sprintf(path, "%s/global/%u", DataDir, r.relNode); } else if (r.spcNode == DEFAULTTABLESPACE_OID) { /* The default tablespace is PGDATA/base */ sprintf(path, "%s/base/%u/%u", DataDir, r.dbNode, r.relNode); } else { /* All other tablespaces accessed via symlinks */ sprintf(path, "%s/pg_tblspc/%u/%u/%u", DataDir r.spcNode, r.dbNode, r.relNode); } return path; }
❖ File Descriptor Pool |
Unix has limits on the number of concurrently open files.
PostgreSQL maintains a pool of open file descriptors:
open()typedef char *FileName
Open files are referenced via: typedef int File
A File
❖ File Descriptor Pool (cont) |
Interface to file descriptor (pool):
backend/storage/file/fd.c File FileNameOpenFile(FileName fileName, int fileFlags, int fileMode); // open a file in the database directory ($PGDATA/base/...) File OpenTemporaryFile(bool interXact); // open temp file; flag: close at end of transaction? void FileClose(File file); int FileRead(File file, char *buffer, int amount); int FileWrite(File file, char *buffer, int amount); int FileSync(File file); long FileSeek(File file, long offset, int whence); int FileTruncate(File file, long offset);
Analogous to Unix syscalls open()close()read()write()lseek()
❖ File Descriptor Pool (cont) |
Virtual file descriptors (Vfd
VfdCache[0]❖ File Descriptor Pool (cont) |
Virtual file descriptor records (simplified):
backend/storage/file/fd.c typedef struct vfd { s_short fd; // current FD, or VFD_CLOSED if none u_short fdstate; // bitflags for VFD's state File nextFree; // link to next free VFD, if in freelist File lruMoreRecently; // doubly linked recency-of-use list File lruLessRecently; long seekPos; // current logical file position char *fileName; // name of file, or NULL for unused VFD // NB: fileName is malloc'd, and must be free'd when closing the VFD int fileFlags; // open(2) flags for (re)opening the file int fileMode; // mode to pass to open(2) } Vfd;
❖ File Manager (cont) |
PostgreSQL stores each table
PGDATA/pg_database.oid
❖ File Manager (cont) |
Data files (Oid, Oid.1, ...):
❖ File Manager (cont) |
Free space map (Oid_fsm):
VACUUMDELETExmaxVACUUM❖ File Manager (cont) |
The "magnetic disk storage manager" (storage/smgr/md.c
PageIDPageIDinclude/storage/buf_internals.h typedef struct { RelFileNode rnode; // which relation/file ForkNumber forkNum; // which fork (of reln) BlockNumber blockNum; // which page/block } BufferTag;
❖ File Manager (cont) |
Access to a block of data proceeds (roughly) as follows:
// pageID set from pg_catalog tables // buffer obtained from Buffer pool getBlock(BufferTag pageID, Buffer buf) { Vfd vf; off_t offset; (vf, offset) = findBlock(pageID) lseek(vf.fd, offset, SEEK_SET) vf.seekPos = offset; nread = read(vf.fd, buf, BLOCKSIZE) if (nread < BLOCKSIZE) ... we have a problem }
BLOCKSIZE
❖ File Manager (cont) |
findBlock(BufferTag pageID) returns (Vfd, off_t)
{
offset = pageID.blockNum * BLOCKSIZE
fileName = relpath(pageID.rnode)
if (pageID.forkNum > 0)
fileName = fileName+"."+pageID.forkNum
if (fileName is not in Vfd pool)
fd = allocate new Vfd for fileName
else
fd = use Vfd from pool
if (pageID.forkNum > 0) {
offset = offset - (pageID.forkNum*MAXFILESIZE)
}
return (fd, offset)
}
Produced: 20 Feb 2023