Introduction: Why PostgreSQL Background Workers Deserve a Playbook
Once I started pushing PostgreSQL background workers beyond simple demos, I realized they behave like first-class citizens in the server, with all the same responsibilities and risks. A worker crash, a deadlock on shared memory, or a sloppy lifecycle hook can quietly poison the whole instance, not just the extension that owns it.
PostgreSQL’s shared memory model is powerful but unforgiving: mis-sized regions, unsafe pointers, or incorrect synchronization quickly turn into performance cliffs or subtle corruption bugs. In my own projects, the bugs that hurt the most weren’t logic errors inside the worker loop, but edge cases around startup, shutdown, and shared memory handoff between processes.
This is why I treat background workers as needing a clear playbook: repeatable patterns for allocating shared memory, coordinating multiple workers, handling postmaster restarts, and cleaning up safely. In the rest of this article, I’ll walk through the five patterns I now reach for when designing robust PostgreSQL background workers that can run in production without keeping me up at night.
1. The Dedicated Background Worker for Long-Running Maintenance Jobs
When I need to run long-lived tasks like custom vacuuming, data rollups, or queue draining, I almost always start with a single dedicated PostgreSQL background worker. One process, one responsibility. This keeps the shared memory story simple: one owner of the state, clear startup and shutdown semantics, and predictable behavior under load.
With a single dedicated worker, shared memory can be modeled as a small control block instead of a mini distributed system. I typically allocate a fixed-size struct for:
- Configuration snapshot (intervals, limits, feature flags).
- Progress markers (last processed XID, timestamp, or ID).
- Control flags (stop requested, reload requested).
Because there’s only one writer—the worker itself—locking can often be reduced to lightweight protection for occasional reads from other backends. In my experience, this dramatically cuts the chance of deadlocks and Heisenbugs from racy updates.
Lifecycle Pattern: From Registration to Graceful Exit
The lifecycle pattern I lean on is:
- _PG_init: Reserve shared memory, initialize the control block, and register the background worker with clear restart behavior.
- Start hook: Attach to shared memory, set up signal handlers, establish a database connection if needed.
- Main loop: Periodically wake, check flags, run a maintenance step, update progress, and sleep.
- Shutdown: On signals, flip a flag, finish the current unit of work, persist final progress, then exit cleanly.
Here’s a minimal C-style sketch of the pattern I use for the main loop:
void
maintenance_worker_main(Datum main_arg)
{
pqsignal(SIGTERM, worker_sigterm_handler);
BackgroundWorkerUnblockSignals();
/* Attach to shared memory control block here */
while (!got_sigterm)
{
int rc;
/* Do a small unit of maintenance work */
perform_maintenance_step();
update_progress_in_shmem();
/* Sleep but wake early on postmaster death or config reload */
rc = WaitLatch(&MyProc->procLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
1000L, /* 1 second */
PG_WAIT_EXTENSION);
ResetLatch(&MyProc->procLatch);
if (rc & WL_POSTMASTER_DEATH)
proc_exit(1);
}
/* Persist any final state and exit */
proc_exit(0);
}
This pattern lets the worker be interruptible but not fragile: it can restart using its shared memory progress markers, without reprocessing everything or leaving half-finished work.
If you haven’t implemented a background worker like this before, it’s worth reading deeper material on designing maintenance daemons and watchdog workers for PostgreSQL 24.1. Routine Vacuuming – PostgreSQL Documentation.
2. Pooled Background Workers for Parallel Task Execution
As soon as I needed to chew through large queues or batch processing jobs, a single dedicated worker stopped being enough. That’s where a small pool of PostgreSQL background workers shines: each worker pulls work from a shared queue, processes it independently, and reports progress back. The trick is doing this without turning shared memory into a hot, contended bottleneck.
In my experience, the safest design is to keep shared memory minimal and coarse-grained, and push fine-grained data into regular tables. I usually model the pool like this:
- Shared memory control block: global config, pool size, a few atomic counters (e.g., in-flight jobs, last error code).
- Task queue in SQL: a table (or partitioned set) that stores tasks, states, and retry info.
- Per-worker local state: everything specific to one job stays in local memory, not in shared memory.
Only the pool-wide coordination lives in shared memory. This dramatically reduces lock contention and makes crashes easier to recover from, because the authoritative job state is stored in tables that normal SQL tools can inspect.
When I first tried to coordinate workers through shared-memory queues, I ran into nasty contention and subtle race conditions. I’ve had much better results with SQL-based leasing:
- Each worker atomically claims the next task using an UPDATE … WHERE state = ‘ready’ … RETURNING pattern.
- A lease timeout (e.g., picked_at + interval) lets other workers reclaim stuck tasks.
- Shared memory just tracks metrics (claimed/processed counts), not the queue itself.
Here’s a PostgreSQL-style pattern I often use from within a worker to claim tasks:
WITH cte AS (
SELECT id
FROM job_queue
WHERE state = 'ready'
ORDER BY priority DESC, id
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE job_queue j
SET state = 'in_progress',
picked_at = now(),
worker_pid = pg_backend_pid()
FROM cte
WHERE j.id = cte.id
RETURNING j.*;
Each background worker runs a query like this inside its loop. The SKIP LOCKED pattern has saved me from a lot of unnecessary shared memory gymnastics; PostgreSQL does the heavy lifting at the row-lock level.
Lifecycle and Pool Sizing Patterns
One thing I learned the hard way was to treat pool sizing and worker lifecycle as part of the shared memory design, not an afterthought. The pattern that’s worked best for me is:
- Fixed pool, dynamic load: Decide a small, fixed pool size in shared memory (e.g., 4–16 workers), and let idle workers sleep via
WaitLatchwhen there’s no work. - Single registration point: In
_PG_init, allocate the pool control block and register N background workers, all pointing to the same main function but with distinct slot indices. - Lightweight coordination: Each worker reads its slot index from the
main_arg, attaches to the shared control block, and increments/decrements atomic counters when starting or finishing tasks. - Graceful drain: On shutdown signals, workers stop claiming new tasks, finish what they’re holding, and mark their slot as idle in shared memory.
Here’s a condensed C-style sketch of a pool worker main loop I’ve used:
void
pool_worker_main(Datum main_arg)
{
int slot_id = DatumGetInt32(main_arg);
pqsignal(SIGTERM, worker_sigterm_handler);
BackgroundWorkerUnblockSignals();
attach_pool_shmem(slot_id);
establish_db_connection();
while (!got_sigterm)
{
if (!claim_next_job())
{
/* No work: sleep a bit, but be interruptible */
int rc = WaitLatch(&MyProc->procLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
5000L,
PG_WAIT_EXTENSION);
ResetLatch(&MyProc->procLatch);
if (rc & WL_POSTMASTER_DEATH)
proc_exit(1);
continue;
}
increment_inflight_counter();
process_current_job();
decrement_inflight_counter();
}
proc_exit(0);
}
This approach keeps PostgreSQL background workers predictable: shared memory tracks only what must be global, the database holds the authoritative queue, and each worker behaves like a well-mannered citizen instead of a noisy neighbor fighting for a shared structure.
When I’m building PostgreSQL background workers as part of an extension, my default stance is: keep the worker strictly extension-scoped and the shared memory as tiny and version-aware as possible. Global, free-floating workers that assume too much about cluster layout or other extensions are the ones that tend to break badly during upgrades, restarts, or when multiple teams share the same database.
In practice, I treat shared memory in an extension like a small control panel rather than a data store. I usually include only:
- A magic/version field to detect struct changes across extension upgrades.
- A size field so the worker can validate what it attached to.
- A few atomic flags/counters for coordination and metrics.
Everything else—configs, job definitions, history—lives in regular extension-owned tables or GUCs. One thing I learned the hard way is that changing a shared memory struct layout without explicit versioning leads to mysterious crashes after ALTER EXTENSION UPDATE. Now I always embed a magic and version:
typedef struct
{
uint32 magic; /* e.g. 0xCAFEBABE */
uint32 version; /* bump on layout change */
Size size; /* sizeof(my_shmem_ctl) */
pg_atomic_uint32 active_workers;
pg_atomic_uint32 last_error_code;
} my_shmem_ctl;
#define MY_SHMEM_MAGIC 0xCAFEBABE
#define MY_SHMEM_VERSION 1
In _PG_init, I allocate this struct once, initialize the fields, and fail fast if an incompatible version is detected. That small bit of paranoia has saved me from some ugly production surprises.
Extension Lifecycle, Catalog Ownership, and Worker Safety
For extension-scoped workers, I align the worker’s lifecycle with the extension’s catalog presence. In my experience, the safest pattern is:
- Register in _PG_init only if the extension’s schema and control tables exist.
- On worker startup, validate the shared memory header (magic, version, size) before touching anything else.
- Use a “disabled” flag or similar in shared memory so an admin can turn off the worker by updating a config table or GUC, without a server restart.
This keeps the blast radius small: the worker only touches its own extension’s objects, it can detect mismatched binaries vs. catalogs, and it won’t accidentally run with half-upgraded state. For deeper reading, I recommend looking into PostgreSQL extension upgrade strategies and ABI-safe shared memory layouts as a complement to this pattern 36.10. C-Language Functions – PostgreSQL Official Documentation.
4. Using Background Workers as Coordinators for External Systems
Some of the most useful PostgreSQL background workers I’ve written didn’t do heavy computation inside the database at all—they coordinated with external services: job runners, message brokers, HTTP APIs, or storage systems. In that role, the worker becomes a scheduler and state synchronizer, and shared memory is the small, trusted brain that keeps coordination safe across crashes and restarts.
When a worker talks to the outside world, I keep shared memory focused on what must survive process boundaries but doesn’t belong in SQL tables. For example:
- Connection / backoff state: current backoff level, last failure timestamp, circuit-breaker style flags.
- Scheduler cursor: last dispatched job ID or timestamp to avoid duplicate scheduling after a restart.
- Health signals: a few counters for success/failure, exposed via SQL functions for monitoring.
All durable business state (job definitions, external IDs, retries) still lives in tables; shared memory just tells each new worker incarnation how to resume coordination safely. In my experience, this separation keeps bugs in external integrations from corrupting core database data structures.
Coordinator Loop Pattern: Poll, Decide, Dispatch
The loop pattern I come back to for coordinators is simple but robust:
- Poll internal tables for due work (or changes) using normal SQL.
- Consult shared memory to respect backoff and health limits.
- Dispatch to the external system, then update both tables and shared memory.
Here’s a trimmed-down C-style sketch I’ve used for a coordinator worker:
void
coordinator_worker_main(Datum main_arg)
{
pqsignal(SIGTERM, worker_sigterm_handler);
BackgroundWorkerUnblockSignals();
attach_coord_shmem();
establish_db_connection();
while (!got_sigterm)
{
if (is_in_backoff())
{
sleep_with_latch(1000L);
continue;
}
/* Select a batch of jobs to dispatch */
JobBatch batch = load_due_jobs();
if (batch.count == 0)
{
sleep_with_latch(2000L);
continue;
}
for (int i = 0; i < batch.count && !got_sigterm; i++)
{
if (dispatch_to_external(batch.jobs[i]))
mark_job_dispatched(batch.jobs[i]);
else
register_failure_and_maybe_backoff();
}
}
proc_exit(0);
}
One thing I’ve learned is to treat every external call as potentially slow or flaky. By tracking backoff and health in shared memory, each worker restart inherits the same coordination rules, instead of hammering an already-sick upstream system from a clean slate.
5. Observability-First Background Workers for Safe Production Operations
The background workers that have given me the least trouble in production are the ones I designed around observability from day one. Instead of treating metrics and health checks as nice-to-haves, I use shared memory as a compact, always-on telemetry surface that any backend can read and expose via SQL. That’s saved me hours of guessing when something goes wrong at 2 a.m.
For observability-first PostgreSQL background workers, I keep a small, read-mostly metrics struct in shared memory and update it with atomic operations. Typical fields I include are:
- Counters: total jobs processed, errors, retries, restarts, timeouts.
- Timestamps: last successful run, last failure, last configuration reload.
- State flags: is the worker currently busy, backoff level, last error code.
Then I expose a stable SQL view or function that reads this struct and returns a row per worker or per pool. One thing I learned early is to avoid locking-heavy access here: pg_atomic_* and simple volatile reads are usually enough. For example:
typedef struct
{
pg_atomic_uint64 jobs_total;
pg_atomic_uint64 jobs_failed;
pg_atomic_uint64 restarts;
pg_atomic_uint32 last_error_code;
TimestampTz last_ok;
TimestampTz last_failure;
} worker_metrics;
static worker_metrics *Metrics;
Datum
my_worker_stats(PG_FUNCTION_ARGS)
{
TupleDesc tupdesc;
Datum values[6];
bool nulls[6] = {false};
if (Metrics == NULL)
ereport(ERROR, (errmsg("worker metrics not initialized")));
tupdesc = RelationNameGetTupleDesc("my_worker_stats_view");
values[0] = Int64GetDatum(pg_atomic_read_u64(&Metrics->jobs_total));
values[1] = Int64GetDatum(pg_atomic_read_u64(&Metrics->jobs_failed));
values[2] = Int64GetDatum(pg_atomic_read_u64(&Metrics->restarts));
values[3] = Int32GetDatum(pg_atomic_read_u32(&Metrics->last_error_code));
values[4] = TimestampGetDatum(Metrics->last_ok);
values[5] = TimestampGetDatum(Metrics->last_failure);
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
With this in place, I can plug worker stats into dashboards or quick SQL queries instead of tailing logs and guessing.
Using Observability to Drive Safer Behavior
Once metrics live in shared memory, I don’t just watch them—I let them shape worker behavior. In my own deployments, I’ve used these patterns:
- Dynamic throttling: if error counters spike within a short window, the worker shifts into a slower, backoff mode.
- Safe feature flags: toggles for “dry-run” or “read-only” modes, updated via SQL and read from shared memory on each loop.
- Readiness checks: simple SQL functions that return healthy/unhealthy based on metrics, hooked into external monitoring.
For readers who want to go deeper, it’s worth exploring patterns for exporting PostgreSQL extension and background worker metrics to monitoring systems so that these shared memory counters seamlessly feed Prometheus, Grafana, or similar tools pg_background GitHub repository - Export PostgreSQL background worker metrics for monitoring systems.
Conclusion: Choosing the Right PostgreSQL Background Worker Pattern
When I’m deciding how to structure PostgreSQL background workers, I always start from the workload, not the API surface. Long-running, sequential maintenance fits a single dedicated worker; high-throughput queues call for a small, well-behaved pool; extension logic stays extension-scoped with a tiny, versioned shared memory block; external coordination leans on SQL state with shared memory as a resilient cursor; and production-facing workers get observability baked in from day one.
The common thread across all of these patterns is restraint: keep shared memory lean, clearly owned, and easy to validate, and push complex or durable state into tables where normal SQL and tooling can help you debug. If you’re starting a new worker today, I’d sketch the lifecycle, shared memory struct, and observability story first—then fill in the business logic around that solid core.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





