Introduction
Curious what really powers PostgreSQL behind the scenes? This post exposes core PostgreSQL architecture and storage paradigm that drive its reliability and performance. From backend processes to storage structure and table spaces, we will walk you through all of these pieces and show you how they work together to make PostgreSQL one of the most powerful open-source database today.
Let’s dive in.
The PostgreSQL Architecture – Simplified

The PostgreSQL server consists of multiple processes managed by the postmaster. By default, it listens on port 5432 and can be started or stopped using commands like pg_ctl -D $PGDATA start and pg_ctl -D $PGDATA stop. When a client connects, the server forks a dedicated backend process to handle that connection. Each backend process serves only one specific database. By default, PostgreSQL supports up to 100 concurrent backend connections, as defined by the max_connections setting.
PostgreSQL relies on a set of auxiliary background processes, each dedicated to a specific system maintenance task. These processes work behind the scenes to ensure performance, reliability, and recoverability of the database system.
Key Auxiliary Background Processes:
- Background Writer
Periodically flushes dirty (modified) data pages from shared buffers to permanent storage. - Checkpointer
Creates recovery checkpoints at regular intervals to allow crash recovery and also flushes dirty pages to disk. - Autovacuum
Automatically reclaims storage by cleaning up dead tuples and preventing table bloat. - WAL Writer
Writes the in-memory Write-Ahead Logs (WAL) to disk regularly to ensure durability of transactions. - Stats Collector
Collects and updates performance statistics used by the query planner and monitoring tools. - Archiver
Copies completed WAL segment files to an archive location for backup and disaster recovery.
Memory Space
PostgreSQL uses different types of buffer spaces to optimize performance and manage memory efficiently. These include local buffers, which are private to each process, and shared buffers, which are accessible to all backend processes. Each serves a distinct purpose and is tuned through specific configuration settings.
Local Buffers (Per-Process Cache)
- Private buffer space maintained individually by each process.
- Controlled by
work_memandmaintenance_work_memsettings. - Used for operations like sorting, hashing, or temporary tuple storage during query execution.
- When the buffer exceeds the configured size, PostgreSQL spills excess data to temporary files on disk.
- A central buffer pool shared among all backend processes.
- Acts as a communication and data-sharing medium across sessions.
- Managed by the buffer manager, which coordinates data movement between shared memory and physical storage.
- Also used to cache both data pages and WAL (Write-Ahead Logging) pages.
Object Identifier (OID)
In PostgreSQL, an OID (Object Identifier) is a special system-generated identifier assigned to various database objects. It serves as a unique internal reference, allowing PostgreSQL to efficiently manage and track objects such as:
- Tables
- Indexes
- Databases
- Views
- Functions and more
mydatabase=# select datname, oid from pg_database where datname = 'mydb'; datname | oid ------------+------- mydatabase | 16385 (1 row) mydatabase=# select relname, oid from pg_class where relname = 'mytable'; relname | oid ---------+------- mytable | 16386 (1 row)
Physical Storage in PostgreSQL Architecture
The PostgreSQL data directory (PGDATA) is the heart of PostgreSQL’s physical storage system. It houses everything the database server needs to operate effectively and reliably. Understanding its structure is essential for database administrators and developers alike.
What’s Inside the PGDATA Directory?
The PGDATA directory contains all critical components required for PostgreSQL to function:
- Data files – These store actual user and system data across all databases in the cluster.
- WAL files (Write-Ahead Logs) – Used for crash recovery and replication.
- Configuration files – Includes
postgresql.conf,pg_hba.conf, and others that control server behavior and access. - System catalogs – Metadata describing database objects, roles, and privileges.
Logical Structure: The Database Cluster
In PostgreSQL, a database cluster refers to a set of databases managed by a single server instance and stored under one PGDATA directory — not a group of servers (as the word “cluster” might suggest elsewhere).

PGDATA Structure
A PostgreSQL database cluster is fundamentally represented on disk by a main directory, commonly referred to as the PGDATA directory. This directory serves as the central hub for all the data files, configurations, and internal structures of the PostgreSQL server.

The above is a simplified view of what PGDATA looks like. The blue blocks are created by initdb when the cluster is first initialized. Orange blocks represent configuration and some meta data files about this PostgreSQL. Purple block represents tablespace where PostgreSQL can store data outside of the cluster using the tablespace mechanism. The green block represents a new database and a new table created under this database using the query below:
postgres=# CREATE DATABASE mydatabase; postgres=# \c mydatabase mydatabase1=# CREATE TABLE mytable(a INT, b TEXT);
Other Important Sub Directories
| PG_VERSION | Stores the PostgreSQL version number for this data cluster |
| pg_hba.conf | Defines client authentication rules (host-based authentication) |
| postgresql.conf | Main system configuration file for PostgreSQL |
| postmaster.pid | Contains the PID of the PostgreSQL server process (used for management) |
| base/ | Stores data files for each individual database in the cluster |
| global/ | Holds cluster-wide metadata, such as user roles and shared catalogs |
| pg_wal/ | Directory containing WAL (Write-Ahead Logging) files for crash recovery |
| pg_tblspc/ | Contains symbolic links to tablespaces (external storage locations) |
| pg_twophase/ | Stores two-phase commit transaction data |
| pg_repslot/ | Contains logical and physical replication slot data |
| pg_xact/ | Holds transaction commit and rollback state data |
The base/ Directory
The base/ directory inside the PostgreSQL PGDATA directory contains the physical storage for each database in the cluster. Each subdirectory within base/ is named after the OID (Object Identifier) of a specific database.
Example:
1 – template0: A read-only system template database used as a base for creating new databases.
4 – template1: A customizable template database. New databases are typically cloned from this.
5 – postgres: The default database created by initdb, used for general administrative tasks.
16384 – mydatabase: An example of a user-created database, with an auto-
assigned OID.
How PostgreSQL Architecture Uses These:
- When you run
CREATE DATABASE, PostgreSQL copies data from eithertemplate0ortemplate1to form the new database. - These subdirectories contain all the catalog tables, user tables, and indexes of their respective databases.
- Each table, index, or system catalog is stored as a data file named by its own OID, also known internally as a
relfilenode.
Key Concept Here:
- OID-based naming: All databases, tables, and indexes are stored as files using their OIDs as filenames.
- Physical layout: PostgreSQL maps logical database objects (tables, indexes) to physical files in these subdirectories.
- System-generated: OIDs are automatically assigned and managed by PostgreSQL.
$ ls $PGDATA/base/16384 112 13361 2603_vm 2616 2665 2702 2995 3440 3601_vm 4151 6110 113 13362 2604 2616_fsm 2666 2703 2996 3455 3602 4152 6111 1247 13362_fsm 2605 2616_vm 2667 2704 3079 3456 3602_fsm 4153 6112 1247_fsm 13362_vm 2605_fsm 2617 2668 2753 3079_fsm 3456_fsm 3602_vm 4154 6113 1247_vm 13365 2605_vm 2617_fsm 2669 2753_fsm 3079_vm 3456_vm 3603 4155 6116 1249 13366 2606 2617_vm 2670 2753_vm 3080 3466 3603_fsm 4156 6117 1249_fsm 1417 2606_fsm 2618 2673 2754 3081 3467 3603_vm 4157 6175 1249_vm 1418 2606_vm 2618_fsm 2674 2755 3085 3468 3604 4158 6176 1255 174 2607 2618_vm 2675 2756 3118 3501 3605 4159 6228 1255_fsm 175 2607_fsm 2619 2678 2757 3119 3502 3606 4160 6229 1255_vm 2187 2607_vm 2619_fsm 2679 2830 3164 3503 3607 4163 6237 ......
Visibility Map and Free Space Map
Some OID-named data files in PostgreSQL architecture are accompanied by two additional files with _fsm (Free PostgreSQL actively uses two auxiliary files — _vm (Visibility Map) and _fsm (Free Space Map) — alongside the main OID-named data files to boost performance and manage storage more efficiently.
- Visibility Map (
_vm) keeps track of which pages contain only visible tuples. This lets PostgreSQL skip reading entire heap pages during index-only scans, significantly speeding up queries. - Free Space Map (
_fsm) records the amount of free space in each data page, helping the system quickly find suitable locations for inserting new rows without scanning the whole table.
PostgreSQL creates and maintains these files automatically. When working with large, frequently accessed tables, these maps reduce unnecessary I/O and accelerate common operations. However, if you’re dealing with temporary or small tables, you might not see these files at all — PostgreSQL skips them when they offer no real benefit. There’s no need to manage these files manually; PostgreSQL handles everything under the hood to keep your database running smoothly.
Understanding PostgreSQL Data File Sizes
You can directly check where PostgreSQL stores a table by running:
SELECT pg_relation_filepath('mytable');<br>
This command returns the relative file path, such as:
base/16384/16385
Now just go to the $PGDATA/base/16384 directory and list the file:
cd $PGDATA/base/16384
ls -ltr 16385
This file 16385 is the actual physical storage of the mytable table.
PostgreSQL keeps each table or index in its own file, and once the file grows past 1 GB, it doesn’t stop there. Instead, it creates a new segment file automatically, such as:
16385.1for the second gigabyte,16385.2for the third, and so on.
It keeps writing to these segment files in sequence as the table grows. You don’t need to do anything—PostgreSQL handles the splitting and loading of these files for you in the background. This segmented layout keeps the database engine efficient and avoids issues with file size limits on different filesystems.
Table Space
In PostgreSQL, a tablespace refers to an additional storage location outside the main PGDATA directory. When you run the CREATE TABLESPACE command, PostgreSQL creates a symbolic link inside the pg_tblspc subdirectory. This symbolic link points to the external storage location you’ve specified, effectively allowing PostgreSQL to store table and index data in directories beyond the default data area. This feature is especially useful for managing disk space and performance by distributing data across multiple storage devices.
postgres=# CREATE TABLESPACE mytablespace LOCATION '$MYTABLESPACE'; postgres=# CREATE DATABASE mydatabase2 TABLESPACE mytablespace; postgres=# \c mydatabase2 mydatabase2=# CREATE TABLE mytable2 (a INT, b TEXT);
The above query causes this change in PGDATA below.

Summary
PostgreSQL Architecture involves a single server that manages a database cluster—a group of databases sharing one physical root: the PGDATA directory. Of course there are also distributed PostgreSQL architecture deployment involving multiple servers working together. We will slowly uncover their secrets in the upcoming posts so please do check it out!
Related

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





