Skip to content
Home » All Posts » A Warm Welcome to the PostgreSQL Source Code!

A Warm Welcome to the PostgreSQL Source Code!

Introduction

If you’ve ever felt curious — or even a little intimidated — by the massive PostgreSQL codebase, you’re not alone. But don’t worry! This guide is here to give you a gentle, beginner-friendly introduction to the PostgreSQL source code directory. You’ll know which parts of the codebase to inspect to diagnose an issue.

Let’s take your very first steps into the heart of PostgreSQL — together!

Source Repository Overview

I have prepared a simplified overview of PostgreSQL source code repository with most important sub folders that you need to be aware of. The orange and yellow blocks represent the extensions and documentation and PostgreSQL by default does not compile and build them. They are optional modules that you can manually compile under each sub folder. The blue blocks contains almost every PostgreSQL components including backend modules, front end tools, communication protocols, test scripts …etc. This is the main build target of PostgreSQL when you run the make command.

Add Your Own Module to PostgreSQL?

When planning to add a new feature to PostgreSQL, one of the first questions to ask is whether the functionality belongs to the frontend, the backend, or both. It’s also important to consider whether the feature can be implemented as a plugin or extension. If it can, that’s usually the preferred path as it avoids modifying the core PostgreSQL codebase, which makes future upgrades and version compatibility much easier.

The nature of your new module directly influences where in the PostgreSQL source tree your code should live. Generally speaking:

  • Frontend changes typically go in the src/bin or src/fe_utils directories.
  • backend logic belongs in src/backend.
  • extensions should be placed under contrib/<your extension>.
  • test scripts should be placed under src/test
  • document files should be placed under doc/

Frontend Vs Backend

In PostgreSQL, frontend and backend logics do not necessarily share the same common codes. This is deliberately designed this way to avoid unnecessary includes or overheads. For example, frontend modules placed in src/bin does not include common server modules or other modules that are designed for backend use. If a piece of code is to be shared between frontend and backend modules, PostgreSQL uses ifdef FRONEND to separate frontend and backend logics. In other words, the shared function behaves differently based on who calls it.

Important Backend Modules:

We will dive deeper into these modules in future posts:

  • storage engine
  • index
  • buffer manager
  • access methods
  • parser
  • rewriter
  • optimizer
  • planner
  • postmaster
  • replication engine
  • statistic engine

Important Frontend Tools

These are independently built binary utilities tools around PostgreSQL database.

  • initdb
  • pg_ctl
  • pgbench
  • psql
  • pg_waldump
  • pg_basebackup
  • pg_dump

Regression Tests in test Sub Folder

PostgreSQL comes with a built-in regression testing framework designed to verify its SQL implementation and ensure compatibility as the system evolves. This framework plays a critical role in maintaining both standard SQL behavior and PostgreSQL’s own extensions. Whenever new functionality is introduced (big or small) it should be validated through regression tests to guarantee that existing SQL features remain intact.

<IMPORTANT> As a rule of thumb, when you add or modify existing functions within PostgreSQL, always remember to update or add new regression test cases for it!

The regression test framework is located in the src/test/regress directory of the PostgreSQL source tree. You only need to know about the following sub folders and what to put in them.

  • sql/ –> put SQL files that test whatever you need to test. You can treat one SQL file as one test case.
  • expected/ –> the expected output of each SQL file executed from sql/ sub folder.
  • results/ –> the actual output of each SQL file executed from sql/ sub folder. (Created only after a regression is run).

Also, you need to know parallel_schedule file as it groups test cases (individual SQL file in /sql). You modify this file to skip or include certain test cases.

Example: int4.sql

As I mentioned, you could consider it as a test case, so you should include all possible test scenarios of a feature. For example, int4.sql tests PostgreSQL integer data type handling and it contains things like:

-- INT4
-- int4_tbl was already created and filled in test_setup.sql.
-- Here we just try to insert bad values.
INSERT INTO INT4_TBL(f1) VALUES ('34.5');
INSERT INTO INT4_TBL(f1) VALUES ('1000000000000');
INSERT INTO INT4_TBL(f1) VALUES ('asdf');
INSERT INTO INT4_TBL(f1) VALUES ('     ');
INSERT INTO INT4_TBL(f1) VALUES ('   asdf   ');
INSERT INTO INT4_TBL(f1) VALUES ('- 1234');
INSERT INTO INT4_TBL(f1) VALUES ('123       5');
INSERT INTO INT4_TBL(f1) VALUES ('');
… …

Example: int4.out

As PostgreSQL executes each query in a SQL file, such as the one mentioned above, it will produce a result. This is where you want to evaluate if the result is expected. For example, int4.sql will produce this int4.out in results/ sub folder: PostgreSQL then compares this file with the same file in the expected folder using the standard diff utility command. If they are exactly the same, the test passes, otherwise the test fails.

-- INT4
-- int4_tbl was already created and filled in test_setup.sql.
-- Here we just try to insert bad values.
INSERT INTO INT4_TBL(f1) VALUES ('34.5');
ERROR:  invalid input syntax for type integer: "34.5"
LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('34.5');
                                         ^
INSERT INTO INT4_TBL(f1) VALUES ('1000000000000');
ERROR:  value "1000000000000" is out of range for type integer
LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('1000000000000');
                                         ^
… …

How to Prepare expected output file?

Please do not write it by hands! Let PostgreSQL generates it for you!

When you add your own test case SQL file, you most likely do not have an expected output file yet. You will have to run the regression test once to generate the resultant output file in results/ folder. Since you do not have an expected output file, the regression test will fail. But that is okay.

You will have to examine the resultant output file in /results and evaluate yourself whether the output is expected. When it is the right output you expect, then you can copy this output file from results/ to expected/ folder. This way, when you run the regression again, the diff utility tool will find that both the resultant and expected output are the same, thus passing the test.

How to Trigger Regression Test

You may trigger regression test against a temporary PostgreSQL instance or against your own running instance.

Testing with a Temporary PostgreSQL Instance

From the root of the PostgreSQL source tree, you can simply run:

make check

Testing with an Existing PostgreSQL Instance

If you prefer to run regression tests against a PostgreSQL instance that is already running, set environment variables to point to it and run:

export PGHOST=127.0.0.1
export PGPORT=5432
make installcheck

Specific Feature Tests

In addition to regression tests, there are other tests related to procedural language, replication, recovery, security…etc that PostgreSQL can run. You will need to enable tap tests during ./configure to be able to run these: (./configure –enable-tap-tests)

src/interfaces/ecpg/test
src/test/authentication
src/test/isolation
src/test/replication
src/test/recovery
src/bin
contrib
src/pl

To run these, you can use

make check-world
# or
export PGHOST=127.0.0.1
export PGPORT=5432
make installcheck-world

Security Tests

These tests do not run by default and can only run when your PostgreSQL build has security enabled. There is a good reason why these tests are not run by default because some of them require third party services like ldap server.

To run all these security tests you first have to:

/configure --with-ssl=openssl -- with-ldap --with-gssapi

and then go to their respective directory and run make check with a special variable set:

cd src/test/ssl;       make PG_TEST_EXTRA=ssl check 
cd src/test/ldap;      make PG_TEST_EXTRA=ldap check 
cd src/test/kerberos;  make PG_TEST_EXTRA=kerberos check 

More on testing here.

PostgreSQL Build System

In PostgreSQL, the build system has traditionally been driven by Makefiles, but starting with PostgreSQL 16, the project also introduced support for the more modern Meson build tool. Each approach has its strengths and weaknesses:

  • Makefile (Traditional Build System)
    • Advantages: Widely used, mature, and stable. It’s proven to handle large projects effectively and has been the backbone of PostgreSQL builds for decades.
    • Disadvantages: The syntax is relatively complex, making editing and maintenance cumbersome. For very large or intricate projects, managing Makefiles can become difficult and error-prone.
  • Meson (Modern Build System, introduced in PostgreSQL 16)
    • Advantages: Features a clean, simple, and easy-to-understand syntax. Meson provides a faster and more efficient build process while keeping flexibility and usability. It also offers better cross-platform support and improved performance compared to Makefiles.

Compile PostgreSQL

Before starting development on PostgreSQL, the first step is to download its source code. PostgreSQL uses git for source code management, and the official repository is hosted on GitHub at https://github.com/postgres/postgres.

Here’s a quick guide to getting started:

Clone the source code
On Linux, open a terminal and run:

git clone https://github.com/postgres/postgres.git

This command will download the entire PostgreSQL source tree.

Check branches and tags
After cloning, run:

git branch

You’ll see that the default branch is master.

To switch to a stable release branch, for example PostgreSQL 15.3, use:

git checkout REL_15_3

You can also run git tag to list all available release tags.

Install required dependencies
Before compiling, you’ll need to install some development libraries and tools. On Ubuntu Linux, use:

sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libssl-dev libxml2-utils xsltproc

A full list of dependencies can be found on the PostgreSQL wiki.

With the source code checked out and dependencies installed, you’ll be ready to move on to building PostgreSQL from source.

Makefile

This has been the standard way to build PostgreSQL for years—stable and reliable.

Run the configure script

./configure --prefix=$PWD/release --with-ssl=openssl --enable-debug

Compile and install

make
make install

Meson

PostgreSQL 16 introduces experimental support for the Meson + Ninja build system, which offers simpler syntax, faster builds, and better cross-platform support. Although PostgreSQL 15.x does not support it, here’s what the workflow looks like for newer versions:

Requirements

  • Meson ≥ 0.57.2
  • Ninja ≥ 1.10.1

Setup and configure

cd postgres
meson setup build --prefix=$PWD/release -Dssl=openssl
cd build
# check current build parameters
meson configure
# change build parameter - enable assertions
meson configure -Dcassert=true

Build, test and install

ninja            # compile
meson test       # run tests
sudo ninja install   # install to prefix

After installation is complete, you should see several new directories created inside the installation path you specified. These include:

  • bin – contains the PostgreSQL executables such as psql, initdb, pg_ctl, and server binaries.
  • include – holds the C header files needed for compiling extensions or applications against PostgreSQL.
  • lib – provides the shared libraries used by PostgreSQL and its client applications.
  • share – contains configuration files, error message catalogs, and database initialization data.

Together, these directories form the installed PostgreSQL environment, ready for use or development.

Summary

I hope this introduction has given you a solid starting point for understanding the PostgreSQL source code structure—how the modules are organized, how regression tests are written and executed, and most importantly, how to build PostgreSQL from source. In the next sections, we’ll dive deeper into some of the most important internal modules, exploring how they work under the hood. With this foundation, you’ll not only be able to read and navigate the codebase with confidence, but also start envisioning how you could one day contribute your own features to PostgreSQL’s core.

Related

Join the conversation

Your email address will not be published. Required fields are marked *