Introduction: Why pg_upgrade Best Practices Matter for PostgreSQL DBAs
Major version upgrades are where PostgreSQL environments are most exposed. In my experience, routine backups and minor updates rarely cause drama, but a major jump to PostgreSQL 16 or 17 is where a small misstep in PostgreSQL pg_upgrade best practices can turn into hours of downtime, data validation headaches, or even a failed rollback.
The pg_upgrade tool is incredibly powerful because it lets me avoid logical migrations and long restore windows, but that power cuts both ways. It assumes the DBA has done their homework: extension compatibility checked, disk layout understood, parameters reviewed, and a clear cutover plan rehearsed. When I’ve seen upgrades go sideways, it was almost never pg_upgrade itself that was at fault—it was gaps in preparation and testing.
By following a disciplined set of pg_upgrade practices, I can usually reduce production downtime to just the final switchover, while still sleeping at night because I’ve tested the process end-to-end on a clone. This article focuses on the seven practices I lean on whenever I’m planning a major upgrade, especially for mission‑critical systems moving to PostgreSQL 16 or 17, where new features and behavioral changes are most likely to surface once the cluster is live.
1. Inventory and Classify Your PostgreSQL Clusters Before pg_upgrade
Whenever I plan a major upgrade, I start with one boring but critical step: building a clean, accurate inventory. Without it, even the best PostgreSQL pg_upgrade best practices become guesswork. I want to know exactly what I’m upgrading, how it’s used, and what can break if something behaves differently in PostgreSQL 16 or 17.
List every cluster, version, and basic configuration
First, I map out every PostgreSQL instance and cluster: host, port, data directory, and current version. I also record key settings that might affect pg_upgrade, like data directory paths, wal_level, and shared_preload_libraries. For medium or large estates, I’ve found it easiest to script this discovery so I don’t miss a “forgotten” test or reporting instance that still matters to someone.
Here’s a simple pattern I’ve used to pull cluster info across multiple Linux servers with SSH and psql:
#!/usr/bin/env bash
SERVERS=(db1 db2 db3)
for host in "${SERVERS[@]}"; do
echo "=== $host ==="
ssh "$host" "psql -Atqc \"select version(), current_setting('data_directory');\"" 2>/dev/null
echo
done
Once I have this, I group clusters by version (e.g., 12, 13, 14) so I can plan which ones are eligible for direct pg_upgrade to the target version and which may need a stepping-stone release or extra care.
Capture extensions, contrib modules, and custom objects
The next layer of inventory is extensions and contrib modules. In my experience, this is where surprises hide, especially when moving to newer releases that deprecate or change behavior in popular extensions. For each database in a cluster, I run something like:
psql -Atqc " select current_database(), extname, extversion from pg_extension order by 1, 2;" > extensions_inventory.tsv
I also note custom types, languages, and foreign data wrappers that might depend on server binaries or external libraries. If an extension isn’t available or compatible on PostgreSQL 16 or 17, I want to know long before I schedule downtime. I usually annotate the inventory with a simple status: compatible, needs upgrade, or unknown, based on release notes or vendor documentation. pg_upgrade – PostgreSQL official documentation
Classify workloads by criticality and upgrade risk
Finally, I classify each cluster by business criticality and workload pattern. A 10 TB OLTP cluster with 24×7 traffic is not in the same risk bucket as a nightly reporting database, even if they run the same PostgreSQL version. I usually tag each cluster with:
- Criticality: mission-critical, important, or non-critical
- Workload type: OLTP, analytics, mixed, or batch
- Dependencies: apps, ETL jobs, reporting tools, or third-party integrations
This classification helps me decide upgrade order and testing depth. In my own teams, we always prioritize doing pg_upgrade dry-runs and extra regression testing on the highest-risk, highest-impact clusters first, using low-risk systems as a rehearsal ground. By the time we touch production OLTP, the process feels routine because we’ve already shaken out most issues on less critical environments.
2. Use pg_upgrade –check and Compatibility Testing as a Non‑Negotiable Step
One thing I learned the hard way is that skipping pg_upgrade –check is essentially flying blind. On paper, everything can look compatible; in reality, a single extension, collation, or parameter mismatch can stop your PostgreSQL 16 or 17 upgrade at the worst possible moment. Running pg_upgrade in check mode—early and often—is one of the most important PostgreSQL pg_upgrade best practices I stick to.
Run pg_upgrade –check on a realistic copy of production
In my own workflows, I never point pg_upgrade –check at a random backup. I create a realistic clone of production: same OS, same filesystem layout, same PostgreSQL binaries. Then I run the full compatibility check there, so any problems show up long before the real cutover.
A typical pattern for a major upgrade dry run looks like this:
# As the postgres user on the test host OLD_BINDIR=/usr/lib/postgresql/14/bin NEW_BINDIR=/usr/lib/postgresql/17/bin OLD_DATADIR=/var/lib/postgresql/14/main NEW_DATADIR=/var/lib/postgresql/17/main $NEW_BINDIR/pg_upgrade \ --check \ --old-bindir="$OLD_BINDIR" \ --new-bindir="$NEW_BINDIR" \ --old-datadir="$OLD_DATADIR" \ --new-datadir="$NEW_DATADIR" \ --old-options "-c config_file=/etc/postgresql/14/main/postgresql.conf" \ --new-options "-c config_file=/etc/postgresql/17/main/postgresql.conf"
When I run this early in the project, I can iterate: fix what fails, rerun the check, and document the exact commands that will be used during the real maintenance window.
Interpret pg_upgrade –check output and fix issues systematically
The check output is usually straightforward, but on busy systems it can feel overwhelming the first time you see it. I like to categorize findings into buckets: extensions and shared libraries, data type or collation issues, incompatible settings, and file-level problems (permissions, missing directories, etc.).
To keep it structured, I often parse the log and create a simple checklist. Here’s a toy Python snippet I’ve used to pull out ERROR lines from the check log and build a quick action list:
errors = []
with open("pg_upgrade_check.log") as f:
for line in f:
if "ERROR" in line:
errors.append(line.strip())
for i, err in enumerate(errors, start=1):
print(f"{i}. {err}")
From there, I tackle each issue: install or upgrade missing extensions, adjust configuration flags, align locale/collation, or rebuild problematic indexes on the old cluster if recommended. For tricky items like third-party extensions, I always confirm support for the exact major version I’m targeting, using a vendor’s release notes or a maintained compatibility matrix. pg_upgrade – PostgreSQL Documentation
Go beyond –check: application-level and performance compatibility tests
Even when pg_upgrade –check passes, my job isn’t done. The tool validates cluster-level compatibility, but it doesn’t know anything about my application semantics or query performance. So I treat the upgraded clone as a sandbox and run targeted tests:
- Smoke tests: basic application workflows (login, core transactions, reports) pointed at the upgraded instance.
- Query regression tests: capture representative SQL from pg_stat_statements on the old version and replay them on the new cluster.
- Performance checks: compare key metrics (latency, CPU, buffer hit ratio) between old and new versions under similar load.
On one upgrade to PostgreSQL 15, this extra step caught a subtle query plan change that slowed a critical report by 5×—pg_upgrade –check was perfectly happy, but the business would not have been. Since then, I’ve treated compatibility testing as non‑negotiable: if I can’t prove the upgraded system behaves correctly and performs acceptably on a clone, I don’t schedule the production window.
3. Choose Between In‑Place pg_upgrade and Side‑by‑Side Migrations Wisely
One of the earliest calls I make for any major upgrade is whether to run pg_upgrade in‑place or use a side‑by‑side cluster. Both are valid, and in my experience the right choice depends on risk tolerance, available hardware, and how strict your downtime and rollback requirements are. Getting this decision wrong can make the rest of your PostgreSQL pg_upgrade best practices much harder to execute.
When in‑place pg_upgrade makes sense
With in‑place upgrades, I reuse the same server and storage, pointing pg_upgrade at the existing data directory and installing the new binaries there. This approach can be attractive when hardware is constrained or when the database size is modest.
I usually consider in‑place when:
- The cluster is relatively small, so I can rebuild quickly from backup if needed.
- There’s solid, recently tested disaster recovery (e.g., base backups and WAL archiving).
- Downtime windows are acceptable, and the business is comfortable with a slower rollback path.
On smaller internal systems, I’ve had good success with in‑place upgrades because the operational overhead is low and the environment is simpler to reason about.
Advantages of side‑by‑side pg_upgrade clusters
For most production environments—especially when moving to PostgreSQL 16 or 17—I strongly prefer a side‑by‑side approach: I stand up a new cluster (often on new hosts or storage), run pg_upgrade there, and cut applications over only when I’m satisfied with testing.
This gives me:
- Easy rollback: if something goes wrong after cutover, I can point clients back to the old cluster.
- Safer rehearsal: I can perform multiple dry runs without touching production data directories.
- Hardware refresh: an opportunity to adopt faster storage, more RAM, or new OS versions alongside the upgrade.
In my teams, mission‑critical OLTP systems almost always use side‑by‑side because the reduced risk and testability far outweigh the cost of extra hardware.
Key decision criteria: risk, downtime, and operations
To make the choice explicit, I like to evaluate:
- Rollback strategy: how quickly do we need to revert if the upgrade causes issues?
- Downtime budget: can we afford a longer outage for an in‑place upgrade and potential restore?
- Operational complexity: do we have the skills and automation to manage parallel clusters and traffic cutover?
When those answers point to high risk and low tolerance for surprises, a side‑by‑side migration is almost always the safer path. On simpler or lower‑impact systems, an in‑place pg_upgrade can be perfectly reasonable—as long as the backup and recovery story is rock solid.
4. Align PostgreSQL pg_upgrade Best Practices with Operating System and Packaging
In my experience, some of the nastiest upgrade surprises don’t come from PostgreSQL itself—they come from the OS, package layout, or container image. If I don’t align my PostgreSQL pg_upgrade best practices with how PostgreSQL is installed and managed on the host, I risk missing binary paths, library versions, or data directory conventions that pg_upgrade depends on.
Understand your distro packaging and binary layout
The first thing I check is how PostgreSQL is packaged: native distro packages (like Debian/Ubuntu, RHEL/CentOS/Rocky), upstream PGDG repos, or custom builds. Each has its own directory structure for binaries, configs, and data.
On a new server, I’ll quickly confirm the layout with something like:
which psql
psql -Atqc "select version(), current_setting('data_directory');"
This tells me which binary tree I’m using and where the data lives. When planning pg_upgrade, I make sure my --old-bindir, --new-bindir, and config file paths reflect the real distro layout, not assumptions from another environment.
Account for OS libraries, locales, and tooling
PostgreSQL relies on system libraries (e.g., OpenSSL, ICU, glibc) and locales, and they can differ across OS versions or container images. For major jumps—say, moving from an older RHEL to a newer Debian-based image while upgrading PostgreSQL to 16 or 17—I treat OS and PostgreSQL upgrades as a combined project.
- Confirm that required shared libraries and extensions are installed for the new PostgreSQL packages.
- Verify locales and collations are consistent so index or sort behavior doesn’t change unexpectedly.
- Standardize helper tooling versions: pg_dump, pg_restore, pg_basebackup, and pg_upgrade all from the expected major release.
One habit that’s saved me time is explicitly running the target version’s pg_dump against the old cluster during tests, ensuring client tool versions are compatible before the real cutover. Upgrading PostgreSQL 11 to PostgreSQL 13 with TimescaleDB and PostGIS in Linux using pg_upgrade
Special considerations for containers and automation
When I work with containers, the packaging story shifts again. The image usually contains a specific PostgreSQL major version, and the data directory lives on a mounted volume. For pg_upgrade in this model, I often:
- Build or pull two separate images: one for the old major version, one for the new.
- Mount the same data volume into a dedicated pg_upgrade container, with both bin directories available.
- Automate the entire sequence (stop old container, run pg_upgrade, start new container) with scripts or orchestration tools.
Aligning OS, packaging, and containers with pg_upgrade doesn’t feel glamorous, but it’s one of the most practical ways I’ve found to keep major version migrations repeatable and predictable.
5. Minimize Downtime with Staged Cutovers and Parallel pg_upgrade Execution
On every high-traffic system I’ve upgraded, the main business question hasn’t been “Can we upgrade?” but “How long will we be down?”. Carefully designed staging, replica usage, and parallelization are the backbone of my PostgreSQL pg_upgrade best practices for keeping that downtime window as short as possible.
Stage the upgrade in phases, not one big bang
Instead of treating the upgrade as a single event, I break it into distinct phases: prepare, copy, upgrade, validate, and cutover. Most of these can happen while production is still online.
- Prepare: build new servers or clusters, install target PostgreSQL version, align configs.
- Copy: sync data via physical replica, backup/restore, or storage-level snapshot.
- Upgrade: run pg_upgrade (and its checks) against the staged copy, not the live cluster.
- Validate: run application-level tests against the upgraded copy while users are still on the old system.
- Cutover: stop writes briefly, finalize WAL replay or delta sync, flip connections to the new cluster.
By the time I reach the cutover phase, most of the heavy lifting is done; the remaining downtime is usually limited to a short, predictable maintenance window.
Leverage replicas and snapshots to offload the work
When I have streaming replicas, I almost always use them to stage the upgrade. A common pattern is:
- Promote a replica or use a snapshot of it as the source for pg_upgrade.
- Run pg_upgrade on that promoted copy while the original primary is still serving traffic.
- Once satisfied with tests, schedule a brief outage to redirect clients to the upgraded cluster.
This avoids touching the production primary until the last possible moment and gives me a clean rollback option: if something goes wrong post-cutover, I can fall back to the original primary (or another replica) with well-understood data lag.
Use pg_upgrade parallelization and automation to shrink the window
On large datasets, the analyze and relinking phases can dominate the runtime if I’m not careful. I rely heavily on pg_upgrade’s parallel options and post-upgrade scripts to push work off the critical path.
# Example: run pg_upgrade with multiple jobs to speed up processing pg_upgrade \ --old-bindir=/usr/lib/postgresql/14/bin \ --new-bindir=/usr/lib/postgresql/17/bin \ --old-datadir=/var/lib/postgresql/14/main \ --new-datadir=/var/lib/postgresql/17/main \ --jobs=8 \ --link
After the upgrade, I often run the generated analyze_new_cluster.sh script during a lower-traffic period or in controlled bursts, so the system becomes fully optimized without extending the hard downtime window. In my teams, we also script connection draining, health checks, and DNS/connection string flips, so the cutover itself is a fast, repeatable sequence instead of a manual, error-prone process.
6. Harden Rollback and Disaster Recovery Around pg_upgrade Windows
Every time I plan a major upgrade, I assume something might go wrong at the worst possible moment. That mindset has pushed me to treat rollback and disaster recovery (DR) as core parts of my PostgreSQL pg_upgrade best practices, not afterthoughts. When stakeholders know there’s a clear, tested way back, the whole migration is less stressful.
Define clear rollback scenarios and decision points
I start by writing down specific rollback scenarios: failure during pg_upgrade, failure during post-upgrade validation, and failures that show up hours or days after cutover. For each, I define a decision window and concrete actions.
- During pg_upgrade: abort, fix the issue on the staged environment, rerun; production remains untouched.
- Immediately after cutover: if core health checks or smoke tests fail, redirect traffic back to the old cluster.
- Delayed issues (performance or bugs): if problems exceed an agreed SLA for more than N minutes, trigger a planned rollback route.
What’s helped my teams is making those thresholds explicit in advance, so we don’t argue about “how bad is bad enough” at 2 a.m. in the middle of a failed upgrade.
Treat backups, replicas, and snapshots as upgrade guardrails
Right before an upgrade window, I insist on a verified, restorable backup of the old cluster. If we’re using physical replicas or storage snapshots, I treat them as additional guardrails, not replacements for backups.
- Create and verify a fresh base backup or snapshot shortly before starting pg_upgrade.
- For side-by-side upgrades, keep the old primary (or a replica) intact and read-only during the validation period.
- Document the exact steps to re-promote or reattach the old cluster if roll back is needed.
On one upgrade to PostgreSQL 15, this approach let us roll back cleanly when a third‑party app misbehaved on the new version; we simply flipped connections back to the old primary and continued investigating without production pressure.
Rehearse the rollback path and automate the critical steps
A rollback plan that only exists on a wiki is risky. I prefer to rehearse it on non-production environments until it feels routine. That includes validating connection string changes, DNS updates, and application restarts in both directions.
Wherever possible, I automate the critical pieces—especially the flip between old and new clusters—so I’m not relying on manual commands under pressure. A simplified shell pattern I’ve used in staging looks like:
#!/usr/bin/env bash
set -euo pipefail
ACTION=${1:-cutover} # or rollback
if [[ "$ACTION" == "cutover" ]]; then
# Point apps to new cluster
./update_app_config --target=new
./reload_apps
elif [[ "$ACTION" == "rollback" ]]; then
# Point apps back to old cluster
./update_app_config --target=old
./reload_apps
fi
By the time I enter a real pg_upgrade window, I want both upgrade and rollback flows to be scripted, peer-reviewed, and rehearsed. That preparation is what turns a risky event into a controlled, reversible change.
7. Post‑Upgrade Validation and Performance Tuning on PostgreSQL 16 and 17
In my experience, finishing pg_upgrade is only half the journey; the rest is proving that PostgreSQL 16 or 17 behaves correctly and performs at least as well as before. My PostgreSQL pg_upgrade best practices always include a structured validation and tuning phase, ideally rehearsed on a staging environment before the real cutover.
Run structured functional and data‑integrity checks
Right after cutover, I focus on correctness before chasing performance. I like to keep a short, scripted battery of checks that can run in minutes:
- Cluster sanity: confirm databases, schemas and extensions exist and are at the expected versions.
- Data integrity: row counts and checksums on critical tables compared to pre-upgrade snapshots.
- Application smoke tests: key business workflows (logins, payments, reports) run end-to-end.
Here’s a small pattern I’ve used to quickly compare row counts for a few critical tables before and after the upgrade:
-- Run on old cluster before upgrade CREATE TABLE upgrade_rowcounts AS SELECT 'public.orders' AS rel, count(*) AS cnt FROM public.orders UNION ALL SELECT 'public.customers', count(*) FROM public.customers; -- Export and re-import this small table to the new cluster, then: SELECT r.rel, r.cnt AS old_cnt, n.cnt AS new_cnt FROM upgrade_rowcounts r JOIN ( SELECT 'public.orders' AS rel, count(*) AS cnt FROM public.orders UNION ALL SELECT 'public.customers', count(*) FROM public.customers ) n USING (rel);
If any of these early checks misbehave, I pause and decide quickly whether to fix forward or trigger the rollback plan.
Rebuild statistics and watch new planner behavior
After a major upgrade, statistics and the planner’s cost model can shift, especially with the planner improvements and parallelism tweaks in PostgreSQL 16 and 17. I always run the analyze_new_cluster.sh script generated by pg_upgrade, then monitor for regression queries.
- Ensure a full ANALYZE completes, ideally outside peak hours if the cluster is large.
- Enable and query
pg_stat_statementsto spot slow or changed query plans. - Capture and EXPLAIN problem queries to see whether join strategies or index usage changed.
On one upgrade to PostgreSQL 16, I found a handful of reports suddenly choosing nested loops over hash joins; a combination of fresh statistics and minor index tweaks fixed the issue, but only because we were explicitly watching for plan changes. Ensuring consistent performance after major version upgrades with Amazon Aurora PostgreSQL Query Plan Management
Tune for PostgreSQL 16/17 features, not just old settings
Finally, I resist the temptation to simply copy old settings forward. Each new major release brings improvements that deserve attention:
- Workload-specific tuning: revisit
work_mem,maintenance_work_mem, andeffective_cache_sizein light of any hardware changes and planner behavior. - Parallelism: confirm that
max_parallel_workersand related parameters are aligned with CPU capacity; PostgreSQL 16 and 17 often benefit more from parallel plans. - Autovacuum and freeze: adjust thresholds to the new workload profile, especially on large tables that grew since the last major version.
What’s worked best for me is treating the first few days on PostgreSQL 16 or 17 as an observation period: capture metrics, identify hotspots, and then apply small, measured tuning changes instead of trying to optimize everything on day one.
Conclusion: Operationalizing PostgreSQL pg_upgrade Best Practices
Across all the upgrades I’ve worked on, the smoothest ones were never about a single clever trick; they came from a disciplined approach to PostgreSQL pg_upgrade best practices. I aligned OS and packaging with the upgrade plan, chose in-place vs. side-by-side carefully, minimized downtime with staged cutovers, and treated rollback and DR as first-class citizens.
The real payoff, though, comes when these steps are codified into repeatable runbooks: checklists, scripts, and automation that your team can use for PostgreSQL 16, 17, and beyond. If I were starting from scratch today, my next steps would be to document a full dry-run in staging, turn manual commands into version-controlled scripts, and schedule regular rehearsals of both upgrade and rollback paths.
Handled this way, major PostgreSQL upgrades stop being heroic one-off projects and become just another reliable, well-practiced change in your operational toolkit.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





