PostgreSQL File Manager

COMP9315 23T1 ♢ PG File Manager ♢ [0/15]
❖ PostgreSQL File Manager

PostgreSQL uses the following file organisation ...

[Diagram:Pics/storage/pg-file-arch.png]

COMP9315 23T1 ♢ PG File Manager ♢ [1/15]
❖ PostgreSQL File Manager (cont)

Components of storage subsystem:

PostgreSQL has two basic kinds of files: Note: smgr designed for many storage devices; only disk handler provided
COMP9315 23T1 ♢ PG File Manager ♢ [2/15]
❖ Relations as Files

PostgreSQL identifies relation files via their OIDs.

The core data structure for this is RelFileNode:

// include/storage/relfilenode.h
typedef struct RelFileNode {
    Oid  spcNode;  // tablespace
    Oid  dbNode;   // database
    Oid  relNode;  // relation
} RelFileNode;

Global (shared) tables (e.g. pg_database) have

COMP9315 23T1 ♢ PG File Manager ♢ [3/15]
❖ Relations as Files (cont)

The relpath function maps RelFileNode to file:


// include/common/relpath.h
// common/relpath.c
char *relpath(RelFileNode r)  // simplified
{
   char *path = malloc(ENOUGH_SPACE);

   if (r.spcNode == GLOBALTABLESPACE_OID) {
      /* Shared system relations live in PGDATA/global */
      Assert(r.dbNode == 0);
      sprintf(path, "%s/global/%u",
              DataDir, r.relNode);
   }
   else if (r.spcNode == DEFAULTTABLESPACE_OID) {
      /* The default tablespace is PGDATA/base */
      sprintf(path, "%s/base/%u/%u",
              DataDir, r.dbNode, r.relNode);
   }
   else {
      /* All other tablespaces accessed via symlinks */
      sprintf(path, "%s/pg_tblspc/%u/%u/%u", DataDir
              r.spcNode, r.dbNode, r.relNode);
   }
   return path;
}

COMP9315 23T1 ♢ PG File Manager ♢ [4/15]
❖ File Descriptor Pool

Unix has limits on the number of concurrently open files.

PostgreSQL maintains a pool of open file descriptors:

File names are simply strings: typedef char *FileName

Open files are referenced via: typedef int File

A File is an index into a table of "virtual file descriptors".

COMP9315 23T1 ♢ PG File Manager ♢ [5/15]
❖ File Descriptor Pool (cont)

Interface to file descriptor (pool):


backend/storage/file/fd.c
File FileNameOpenFile(FileName fileName,
                      int fileFlags, int fileMode);
     // open a file in the database directory ($PGDATA/base/...)
File OpenTemporaryFile(bool interXact);
     // open temp file; flag: close at end of transaction?
void FileClose(File file);
int  FileRead(File file, char *buffer, int amount);
int  FileWrite(File file, char *buffer, int amount);
int  FileSync(File file);
long FileSeek(File file, long offset, int whence);
int  FileTruncate(File file, long offset);

Analogous to Unix syscalls open(), close(), read(), write(), lseek(), ...

COMP9315 23T1 ♢ PG File Manager ♢ [6/15]
❖ File Descriptor Pool (cont)

Virtual file descriptors (Vfd)

VfdCache[0] holds list head/tail pointers.
COMP9315 23T1 ♢ PG File Manager ♢ [7/15]
❖ File Descriptor Pool (cont)

Virtual file descriptor records (simplified):


backend/storage/file/fd.c
typedef struct vfd
{
    s_short  fd;              // current FD, or VFD_CLOSED if none
    u_short  fdstate;         // bitflags for VFD's state
    File     nextFree;        // link to next free VFD, if in freelist
    File     lruMoreRecently; // doubly linked recency-of-use list
    File     lruLessRecently;
    long     seekPos;         // current logical file position
    char     *fileName;       // name of file, or NULL for unused VFD
    // NB: fileName is malloc'd, and must be free'd when closing the VFD
    int      fileFlags;       // open(2) flags for (re)opening the file
    int      fileMode;        // mode to pass to open(2)
} Vfd;

COMP9315 23T1 ♢ PG File Manager ♢ [8/15]
❖ File Manager

Reminder: PostgreSQL file organisation

[Diagram:Pics/storage/pg-file-arch.png]

COMP9315 23T1 ♢ PG File Manager ♢ [9/15]
❖ File Manager (cont)

PostgreSQL stores each table

[Diagram:Pics/storage/one-table-files.png]

COMP9315 23T1 ♢ PG File Manager ♢ [10/15]
❖ File Manager (cont)

Data files   (Oid, Oid.1, ...):


[Diagram:Pics/storage/heap-file.png]

COMP9315 23T1 ♢ PG File Manager ♢ [11/15]
❖ File Manager (cont)

Free space map   (Oid_fsm):

Visibility map   (Oid_vm):
COMP9315 23T1 ♢ PG File Manager ♢ [12/15]
❖ File Manager (cont)

The "magnetic disk storage manager" (storage/smgr/md.c)

PostgreSQL PageID values are structured:

include/storage/buf_internals.h
typedef struct
{
    RelFileNode rnode;    // which relation/file
    ForkNumber  forkNum;  // which fork (of reln)
    BlockNumber blockNum; // which page/block 
} BufferTag;

COMP9315 23T1 ♢ PG File Manager ♢ [13/15]
❖ File Manager (cont)

Access to a block of data proceeds (roughly) as follows:

// pageID set from pg_catalog tables
// buffer obtained from Buffer pool
getBlock(BufferTag pageID, Buffer buf)
{
   Vfd vf;  off_t offset;
   (vf, offset) = findBlock(pageID)
   lseek(vf.fd, offset, SEEK_SET)
   vf.seekPos = offset;
   nread = read(vf.fd, buf, BLOCKSIZE)
   if (nread < BLOCKSIZE) ... we have a problem
}

BLOCKSIZE is a global configurable constant (default: 8192)

COMP9315 23T1 ♢ PG File Manager ♢ [14/15]
❖ File Manager (cont)

findBlock(BufferTag pageID) returns (Vfd, off_t)
{
   offset = pageID.blockNum * BLOCKSIZE
   fileName = relpath(pageID.rnode)
   if (pageID.forkNum > 0)
      fileName = fileName+"."+pageID.forkNum
   if (fileName is not in Vfd pool)
      fd = allocate new Vfd for fileName
   else
      fd = use Vfd from pool
   if (pageID.forkNum > 0) {
      offset = offset - (pageID.forkNum*MAXFILESIZE)
   }
   return (fd, offset)
}

COMP9315 23T1 ♢ PG File Manager ♢ [15/15]


Produced: 20 Feb 2023