Case Study: Building a PostgreSQL Background Worker in C for Async SPI Jobs

Introduction

When I first needed to process a stream of small maintenance queries without blocking client traffic, I realized I had to go beyond plain SQL and triggers. That’s where a PostgreSQL background worker in C became the right tool: a server-side process that lives inside Postgres, runs on its own schedule, and talks to the database through the SPI (Server Programming Interface).

In this case study, I walk through how I designed and implemented a background worker that picks up jobs asynchronously and executes them via SPI, without adding load to the main application workers. I’ll focus on the practical details I wish I’d had up front: how to structure the C code, hook into Postgres’ lifecycle, manage connections, and keep the worker robust enough for production.

Background & Context: Why a PostgreSQL Background Worker in C

In the product I was working on, we had a steady stream of small, write-heavy maintenance tasks: recalculating aggregates, cleaning up soft-deleted rows, and updating denormalized columns. These jobs weren’t huge individually, but they were frequent and latency-sensitive enough that batching them into occasional cron jobs caused visible spikes and stale data.

We also needed strong transactional guarantees. Each job had to run in the same transactional context and with the same visibility rules as normal queries, and we wanted backpressure to align with the database’s own limits rather than an external queue’s opinion of capacity. That pointed me toward running the logic as close to the Postgres engine as possible.

I first tried simple cron-based scripts and an external daemon, but both came with issues: connection storms during peak times, complex deployment stories, and brittle monitoring. A PostgreSQL background worker in C solved these pain points by:

Living inside the Postgres server, reusing its connection, memory, and logging infrastructure.
Using SPI to run SQL asynchronously with proper transactional semantics.
Letting me hook into server start/stop, handle signals, and integrate with existing operational tooling.

Once I wired a background worker to poll a jobs table and execute work via SPI with simple rate limiting, the system became both simpler and more predictable. For teams dealing with similar workloads—high-frequency, small, transactional jobs tightly coupled to database state—a native background worker can be a cleaner solution than yet another external service. Chapter 46. Background Worker Processes – PostgreSQL Official Documentation

The Problem: Slow, Unreliable Async Jobs and Baseline Metrics

Before I introduced a PostgreSQL background worker in C, our async jobs were handled by an external worker pool pulling from a queue and running SQL over regular client connections. On paper it was simple; in practice it was a source of latency spikes, missed SLAs, and noisy alerts.

At peak times, we’d see connection storms as dozens of workers tried to talk to Postgres at once. Queue lag regularly hit 2–5 minutes, and p95 job execution latency hovered around 3 seconds, even though the underlying SQL usually took less than 100 ms. A single bad deployment or network blip could strand jobs in the queue, forcing manual replay and occasional data inconsistencies.

What worried me most was how unpredictable the system felt. CPU utilization on the database would go from 30% to 80% in seconds when the external workers ramped up, then drop back down just as fast. Our baseline metrics over a typical day looked roughly like this:

Job volume: ~15,000–20,000 async jobs/day.
Queue lag: median < 20 seconds, but frequent spikes > 3 minutes under load.
Failure rate: ~1–2% transient failures, with a noticeable fraction never retried correctly.
Database impact: 10–20% of peak CPU attributed to connection thrashing and idle-but-logged-in workers.

After a few incidents where jobs silently stopped being processed during high load, I realized we needed something closer to the database: fewer moving parts, better backpressure, and observability that matched what Postgres itself was actually doing.

Constraints & Goals for the C Background Worker

When I sat down to design the PostgreSQL background worker in C, I treated it like a production feature, not a one-off script. That meant being explicit about constraints and goals up front, so I wouldn’t ship something fast but fragile.

Technical and Operational Constraints

Resource limits: The worker had to stay within a small CPU and memory budget and respect existing Postgres settings (max connections, work_mem, etc.).
Transactional safety: Every job needed to run in its own transaction, with clean rollback semantics on error.
Failure isolation: A crash in the worker must not take down the cluster; it should either restart cleanly or stay stopped and be easy to diagnose.
Minimal deployment overhead: No additional services; just a shared library and config change on each Postgres node.

Design Goals for the Worker

Stable latency: Keep p95 job latency close to actual SQL time, even under load.
Built-in backpressure: Let Postgres itself throttle throughput via locks and contention, instead of an external queue.
Simple configuration: Only a handful of tunables (poll interval, batch size, concurrency).
Observable behavior: Log clearly, expose basic metrics via queries (e.g., last run time, processed counts).

Approach & Strategy: Designing the PostgreSQL Background Worker in C

Given the constraints and flaky behavior of our external workers, embedding the logic directly into Postgres felt like the most natural step. A PostgreSQL background worker in C let me stay inside the database process, use SPI for SQL execution, and rely on Postgres itself for lifecycle, logging, and backpressure. In my experience, the closer async job execution is to the data, the fewer moving parts you end up babysitting.

Why SPI and an In-Database Worker

I chose SPI because it provides a straightforward way for server-side C code to run SQL as if it were a normal client. That meant I could:

Use familiar SQL for job fetching and updates.
Wrap each job in its own transaction with `StartTransactionCommand` / `CommitTransactionCommand` semantics.
Reuse existing schema, indexes, and constraints—no special API layer required.

Compared to an external daemon, this approach removed connection management, simplified error handling, and kept monitoring aligned with Postgres’ own logs and statistics. How to Build PostgreSQL Custom Background Workers – OneUptime

High-Level Architecture and Hook Points

At a high level, I structured the worker around Postgres’ background worker framework:

_PG_init: Registers the worker at module load, defining restart behavior and start time.
Main loop function: Connects to a specific database with BackgroundWorkerInitializeConnection, then enters a poll->work->sleep loop.
Signal handling: Handles SIGHUP to reload config and SIGTERM for graceful shutdown.

In practice, the skeleton looked roughly like this:

#include "postgres.h"
#include "fmgr.h"
#include "postmaster/bgworker.h"
#include "executor/spi.h"

PG_MODULE_MAGIC;

void _PG_init(void);
static void async_spi_worker_main(Datum main_arg);

void
_PG_init(void)
{
    BackgroundWorker worker;
    MemSet(&worker, 0, sizeof(BackgroundWorker));

    worker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
    worker.bgw_start_time = BgWorkerStart_ConsistentState;
    worker.bgw_restart_time = 5;
    snprintf(worker.bgw_name, BGW_MAXLEN, "async_spi_worker");
    worker.bgw_main = async_spi_worker_main;

    RegisterBackgroundWorker(&worker);
}

static void
async_spi_worker_main(Datum main_arg)
{
    BackgroundWorkerInitializeConnection("mydb", NULL, 0);

    for (;;)
    {
        int ret;

        StartTransactionCommand();
        ret = SPI_connect();

        if (ret == SPI_OK_CONNECT)
        {
            SPI_execute("SELECT process_next_job()", false, 0);
            SPI_finish();
        }

        CommitTransactionCommand();

        pg_usleep(1000000L); /* 1s sleep between polls */
        CHECK_FOR_INTERRUPTS();
    }
}

This isn’t the full production version, but it shows the core loop: connect via SPI, run a job, commit, then sleep and repeat. One thing I learned early was to keep the main loop boring and predictable—most of the real logic lives in SQL and helper functions.

Job Model and Polling Strategy

Instead of inventing a new queuing system, I stored jobs in a plain table with state columns and timestamps. The worker used a simple, index-friendly query to claim jobs, relying on row-level locks to avoid conflicts with any future workers:

SELECT id, payload
FROM async_jobs
WHERE status = 'pending'
  AND run_at <= now()
ORDER BY run_at
FOR UPDATE SKIP LOCKED
LIMIT 10;

Each iteration, the worker would:

Open a transaction and connect to SPI.
Select and lock a small batch of jobs using the query above.
Execute job-specific logic (either via SQL functions or application-specific procedures).
Mark jobs as done or failed in the same transaction.

This design let Postgres’ own locking and indexing rules shape throughput. When the cluster was under pressure, the worker naturally slowed down; during idle periods, it caught up quickly. From my perspective as an operator, that built-in backpressure was one of the biggest wins of moving to a background worker.

Implementation: Wiring Up the C Extension, SPI, and Background Worker

Once I was happy with the design, I moved on to wiring the PostgreSQL background worker in C. The main pieces were: a minimal extension skeleton, background worker registration, a robust SPI loop, and defensive error handling. I found that being disciplined here paid off later when something went wrong at 3 a.m.

Extension Skeleton and Build Setup

I started with a standard Postgres extension layout: a .control file, a .sql install script, and a C source file. The control file declares the module and ensures it loads at the right time:

# async_spi_worker.control
comment = 'Async SPI background worker'
default_version = '1.0'
relocatable = false
module_pathname = '$libdir/async_spi_worker'

The C file provides the module entry points. I kept the public surface tiny so most logic stays private to the worker process.

Registering the Background Worker

The core registration happens in _PG_init. Here I define start time, restart behavior, and link to the main function:

#include "postgres.h"
#include "fmgr.h"
#include "postmaster/bgworker.h"
#include "miscadmin.h"
#include "tcop/utility.h"
#include "executor/spi.h"

PG_MODULE_MAGIC;

void _PG_init(void);
static void async_spi_worker_main(Datum main_arg);

void
_PG_init(void)
{
    BackgroundWorker worker;
    MemSet(&worker, 0, sizeof(BackgroundWorker));

    worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
                       BGWORKER_BACKEND_DATABASE_CONNECTION;
    worker.bgw_start_time = BgWorkerStart_ConsistentState;
    worker.bgw_restart_time = 10; /* restart after 10s on crash */
    snprintf(worker.bgw_name, BGW_MAXLEN, "async_spi_worker");
    snprintf(worker.bgw_type, BGW_MAXLEN, "async_spi_worker");

#if PG_VERSION_NUM >= 90400
    worker.bgw_main = NULL;
    worker.bgw_main_arg = (Datum) 0;
    snprintf(worker.bgw_library_name, BGW_MAXLEN, "async_spi_worker");
    snprintf(worker.bgw_function_name, BGW_MAXLEN, "async_spi_worker_main");
#else
    worker.bgw_main = async_spi_worker_main;
    worker.bgw_main_arg = (Datum) 0;
#endif

    RegisterBackgroundWorker(&worker);
}

In my experience, getting this step right early avoids mysterious “worker not starting” issues later. I always double-check flags and start time against the Postgres version I’m targeting. Background worker registration sample in PostgreSQL source

Main Loop and SPI Usage

The main function handles database connection, job polling, and SPI calls. I kept it tight and predictable, delegating details to helper functions:

static volatile sig_atomic_t got_sigterm = false;
static volatile sig_atomic_t got_sighup  = false;

static void
async_spi_worker_main(Datum main_arg)
{
    pqsignal(SIGHUP,  handle_sighup);
    pqsignal(SIGTERM, handle_sigterm);
    BackgroundWorkerUnblockSignals();

    BackgroundWorkerInitializeConnection("mydb", NULL, 0);

    while (!got_sigterm)
    {
        int ret;

        CHECK_FOR_INTERRUPTS();

        StartTransactionCommand();
        ret = SPI_connect();

        if (ret == SPI_OK_CONNECT)
        {
            process_job_batch();
            SPI_finish();
        }

        CommitTransactionCommand();

        /* simple sleep; in production I tuned this via GUCs */
        pg_usleep(500000L); /* 500 ms */

        if (got_sighup)
        {
            got_sighup = false;
            ProcessConfigFile(PGC_SIGHUP);
        }
    }
}

Inside process_job_batch, I used SPI to run SQL that selects and processes jobs in a small batch. Keeping transactions short and focused helped avoid surprise contention with user queries.

Error Handling, Signals, and Clean Shutdown

One thing I learned the hard way was to treat errors as first-class citizens. I wrapped the worker loop in Postgres’ error context so any SPI or SQL failure would be logged clearly rather than crashing the entire server process:

static void
process_job_batch(void)
{
    int ret;

    ret = SPI_execute("SELECT process_next_job_batch()", false, 0);

    if (ret != SPI_OK_SELECT && ret != SPI_OK_UTILITY)
    {
        ereport(WARNING,
                (errmsg("async_spi_worker: SPI_execute failed with code %d", ret)));
    }
}

static void
handle_sigterm(SIGNAL_ARGS)
{
    int save_errno = errno;
    got_sigterm = true;
    SetLatch(MyLatch);
    errno = save_errno;
}

static void
handle_sighup(SIGNAL_ARGS)
{
    int save_errno = errno;
    got_sighup = true;
    SetLatch(MyLatch);
    errno = save_errno;
}

With this setup, the worker responds quickly to SIGTERM, finishes the current iteration, and exits cleanly. That behavior matters during rolling restarts and failovers, when I want predictable shutdowns instead of half-processed batches.

Hooks, Observability, and Tuning

To make the worker operable day to day, I added a thin SQL function to expose internal state (last run time, processed counts) and made sure every important decision was logged at the right level. I also exposed a few configuration variables (poll interval, batch size) as GUCs, so I could tune behavior without recompiling. In practice, those small quality-of-life additions made it much easier to run this background worker alongside the rest of the system and trust it during traffic spikes.

Results: Performance, Reliability, and Operational Impact

After rolling the PostgreSQL background worker in C to production, the difference was obvious in both graphs and on-call load. The async path finally behaved like a native part of the database instead of a noisy sidecar service.

Performance and Latency Improvements

Because jobs now executed inside Postgres over SPI, we eliminated connection storms and queue-induced delays. Over the first full week in production, our baseline metrics shifted to:

Job volume: unchanged at ~15,000–20,000 jobs/day.
Queue lag: median < 3 seconds, with rare spikes < 20 seconds during peak load.
p95 latency: dropped from ~3 seconds down to ~200–300 ms (roughly the real SQL cost plus a small polling delay).
CPU overhead: async processing accounted for a steady 5–8% of database CPU, with no sharp connection-related spikes.

From my perspective as the person watching the dashboards, the key change was predictability: async graphs started to track normal query load instead of external queue behavior. io_uring for High-Performance DBMSs: When and How to Use It

Reliability and Failure Behavior

Moving logic into the worker also paid off for reliability. We went from 1–2% transient failures (some never retried) to well under 0.1%, almost all of them true application-level errors. Crashes in the worker process became rare, and when they did occur, the automatic restart behavior plus explicit logging made them easy to spot and debug.

Because each job ran in its own transaction, we stopped seeing the partial-write anomalies that had previously required manual cleanup. In my experience, this alone justified the move: cleaning up after half-applied async work is one of the least fun kinds of incident response.

Operational Impact on the Team

Operationally, life got simpler. There was one fewer service to deploy, monitor, and scale. On-call pages related to async jobs dropped significantly, and when something did go wrong, all the relevant information lived in Postgres logs and system views, not split across two or three different monitoring stacks.

The background worker wasn’t magic—it still required tuning and good observability—but by integrating async execution directly into the database, we turned a flaky external system into a predictable, boring piece of infrastructure. And in production, “boring” is exactly what I want.

What Didn’t Work: Deadlocks, SPI Pitfalls, and Hook Misuse

Not everything went smoothly when I built the PostgreSQL background worker in C. A few early mistakes taught me some painful but useful lessons about deadlocks, SPI usage, and Postgres hooks.

Deadlocks and Over-Eager Locking

My first job-claim query tried to be “safe” by locking too much. I was using broad FOR UPDATE patterns on multiple tables inside one transaction, which looked fine in isolation but deadlocked under real load when user queries hit the same rows.

-- early, problematic pattern
SELECT j.id, j.payload
FROM async_jobs j
JOIN related_table r ON r.id = j.related_id
WHERE j.status = 'pending'
FOR UPDATE;  -- too broad, high deadlock risk

What finally worked was simplifying the locking model: claim jobs from a single queue table first (with FOR UPDATE SKIP LOCKED), then touch other tables in well-defined order inside job-specific logic. In my experience, fewer locks beat clever locks every time.

SPI and Transaction Boundary Pitfalls

I also tripped over SPI by mixing manual transaction control with helper functions that started their own transactions. At one point, a job-processing function opened a new transaction block inside an already active one, which Postgres rightfully rejected.

/* bad idea: nested transaction control around SPI */
StartTransactionCommand();
if (SPI_connect() == SPI_OK_CONNECT)
{
    SPI_execute("BEGIN", false, 0);   /* redundant / conflicting */
    SPI_execute("SELECT process_next_job()", false, 0);
    SPI_execute("COMMIT", false, 0);
    SPI_finish();
}
CommitTransactionCommand();

The fix was to let the worker own the outer transaction and keep all SPI calls “transaction-neutral”—no BEGIN/COMMIT inside SQL functions. Once I tightened that rule, SPI became much more predictable.

Hook Misuse and Unexpected Side Effects

Early on, I experimented with executor and utility hooks to add extra logging around job queries. That seemed clever, but I quickly realized those hooks fire for all statements in the backend, not just my job-related ones. I ended up with noisy logs and subtle behavior changes that were hard to reason about.

In the end, I backed away from heavy hook usage in the background worker and stuck to explicit logging and simple helper functions. For this case study, the lesson was clear: just because a hook exists doesn’t mean it belongs in an async worker that needs to be boring and predictable.

Lessons Learned & Recommendations for C-Based PostgreSQL Background Workers

After living with this PostgreSQL background worker in C in production, a few lessons stand out. Some came from clean design decisions, others from mistakes I’d rather not repeat—but all of them shaped how I’d approach the next worker.

Technical Lessons from the Implementation

Keep the worker loop boring: The main function should do as little as possible—connect, claim work, execute, sleep, repeat. I pushed complexity into SQL and small C helpers, which made failures easier to reason about.
Use SPI carefully and consistently: Let the worker own transactions and keep SPI calls side-effect free with respect to BEGIN/COMMIT. Whenever I tried to get clever with nested transaction control, it backfired.
Design for lock simplicity: A single queue table with FOR UPDATE SKIP LOCKED worked better than trying to outsmart the planner with multi-table locking. When in doubt, reduce the number of locks and make their order obvious.
Observe before you optimize: I added basic counters and timestamps exposed via SQL early on. Those tiny observability hooks were more valuable than any micro-optimization I tried in C.

In my experience, treating the worker like a normal Postgres client that just happens to live inside the server process leads to the most robust design.

Practical Recommendations for Your Own Worker

Start with a minimal skeleton: Get a worker that starts, connects, runs a SELECT 1, and shuts down cleanly. Only then layer in job logic.
Guard against crashes: Use clear error messages around every SPI call, and make sure a single failed job can’t bring down the worker process.
Expose knobs, not hard-coding: Poll interval, batch size, and target database are cheap to expose as GUCs and save you from recompiling for simple tuning.
Test under realistic contention: I learned the most about deadlocks and queue behavior by running synthetic load that mimicked production traffic patterns.
Document the contract: Write down what the worker guarantees (at-least-once vs exactly-once, ordering, retry rules). Future you—and your teammates—will thank you.

If you treat your C-based background worker as a first-class part of the database, with the same rigor you’d apply to core schema or migrations, it can give you powerful async capabilities with surprisingly little operational overhead. PostgreSQL 18.2 Documentation – Background Worker Processes and Extensions

Conclusion / Key Takeaways

Building this PostgreSQL background worker in C turned a fragile async pipeline into a predictable part of the database. By running jobs via SPI inside the server process, I could lean on Postgres for connection management, transactions, locking, and observability instead of recreating those concerns in an external service.

The experience reinforced a few themes: keep the worker loop simple, let SQL and SPI do the heavy lifting, and design your locking and error handling conservatively. When treated as a first-class extension alongside functions, types, and triggers, a C-based background worker becomes a powerful tool in a broader Postgres extension strategy—especially for workloads where “close to the data” async processing matters more than adding another microservice to the mix.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.