Introduction: Why Foreign Data Wrapper Cost Estimation Still Hurts
Whenever I tune federated queries, foreign data wrapper cost estimation is usually the first thing that bites me. On paper, FDWs let me join PostgreSQL tables with remote data as if everything lived in one place. In practice, a bad cost estimate can make the planner do exactly the wrong thing: pull millions of rows across the network, skip an obviously good index, or refuse to push down a filter that the remote side could execute cheaply.
The core problem is that PostgreSQL’s planner only sees a simplified picture of the remote world. Row count estimates, selectivity, and startup vs. run costs are often based on stale or generic assumptions. When those assumptions are off by an order of magnitude, I end up with plans that look fine in EXPLAIN but collapse under real data volumes. That’s when latency spikes, network bandwidth gets saturated, and OLTP backends on the remote system suddenly suffer because a single federated query went rogue.
In my experience, inaccurate foreign data wrapper cost estimation hurts most in two places: cross-database joins and complex analytical queries that mix local and remote sources. If the FDW underestimates remote work, the local server takes on too much, performing joins or aggregations on huge intermediate sets. If it overestimates, it may avoid useful pushdown and keep work that should be remote on the local side instead. Either way, you’re leaving performance on the table.
This article focuses on practical strategies I’ve used to bring those estimates closer to reality: feeding the planner better statistics, teaching the FDW more about remote capabilities, and making small configuration changes that have a disproportionate impact on pushdown and federated query performance.
1. Anchor Your Foreign Data Wrapper Cost Model in Real Remote Statistics
When I first started tuning foreign data wrapper cost estimation, my biggest mistake was trusting default assumptions. The planner can only be as good as the numbers you feed it. If row counts, distinct values, and network costs don’t match reality, every higher-level optimization is built on sand.
Pull Real Row Counts and NDVs from the Remote System
The foundation of a sane FDW cost model is accurate cardinality. That means pulling real row counts, approximate distinct counts (NDVs), and basic distribution info from the remote database, not guessing from a local sample. In my experience, even a rough NDV taken from the remote engine beats a perfect local heuristic.
For Postgres-to-Postgres setups, I script this discovery step and periodically refresh it into a local stats table that my FDW can consult during planning:
-- On the remote server: expose real statistics
CREATE VIEW v_table_stats AS
SELECT
schemaname,
relname,
n_live_tup AS row_count,
n_dead_tup AS dead_rows,
reltuples AS reltuples_est
FROM pg_stat_user_tables;
-- On the local server: pull stats into a helper table
CREATE TABLE remote_table_stats (
schemaname text,
relname text,
row_count bigint,
last_refreshed timestamptz default now()
);
INSERT INTO remote_table_stats (schemaname, relname, row_count)
SELECT schemaname, relname, row_count
FROM foreign_server.v_table_stats;
From there, I can wire these values into a custom FDW or use them to justify more accurate TABLESAMPLE and ANALYZE strategies locally. The key is to stop treating foreign tables as black boxes and give the planner a realistic sense of scale.
Measure Latency and Bandwidth, Then Tune FDW Costs
Once row counts are roughly right, the next big win is modeling transport costs. On high-latency links or chatty workloads, I’ve seen network behavior dominate total query time. I now routinely measure round-trip latency and bulk throughput, then map those numbers into fdw_startup_cost and fdw_tuple_cost (or equivalent custom cost hooks) so the planner can compare remote vs. local work fairly.
A simple script like the one below gives me a concrete baseline for network behavior that I can turn into planner parameters:
#!/usr/bin/env bash
REMOTE_HOST="my-remote-db"
COUNT=5
echo "Measuring latency..."
ping -c $COUNT "$REMOTE_HOST"
echo "Measuring bulk throughput (simple pgbench-style test)..."
psql "host=$REMOTE_HOST dbname=postgres" -c "
SELECT generate_series(1, 1000000) AS id
" > /dev/null
From these measurements I derive a rough per-call startup penalty (connection setup, cursor creation) and a per-row transfer cost. Then I tune the FDW configuration or my own cost callbacks so that, for example, a remote nested loop across millions of rows looks prohibitively expensive and pushes the planner toward better join strategies. Anchoring foreign data wrapper cost estimation in these real numbers is what finally made my federated plans predictable instead of surprising.
For a deeper background on how PostgreSQL’s planner interprets costs and selectivity, it helps to read an in-depth discussion of the PostgreSQL cost-based optimizer internals 51.5. Planner/Optimizer – PostgreSQL Documentation.
2. Tune Foreign Data Wrapper Cost Parameters for Pushdown-Friendly Plans
Once I have realistic stats, the next lever I reach for is tuning foreign data wrapper cost parameters. The goal isn’t to make every operation look cheap remotely; it’s to bias the planner toward safe pushdown, while still penalizing plans that would ship huge intermediate results across the wire. Getting this balance right is usually where foreign data wrapper cost estimation starts to pay off in day-to-day workloads.
Balance Startup vs Tuple Costs to Reflect Remote Behavior
Most FDWs expose knobs that mirror PostgreSQL’s generic cpu_tuple_cost and cpu_operator_cost, typically as fdw_startup_cost and fdw_tuple_cost. In my own setups, I treat startup as “round trips and protocol overhead” and tuple cost as “per-row decode and transfer.” If the remote system has expensive connection setup but streams rows efficiently, I raise startup and keep tuple cost modest. For chatty APIs or JSON-heavy payloads, I increase tuple cost so the planner understands that large scans are painful.
Here’s a simple example I’ve used with postgres_fdw when the link is moderately slow but throughput is decent:
ALTER SERVER reporting_fdw OPTIONS (
SET fdw_startup_cost '500', -- penalize many tiny remote calls
SET fdw_tuple_cost '0.02' -- reflect per-row transfer/parse cost
);
-- For a particularly heavy foreign table
ALTER FOREIGN TABLE fact_sales OPTIONS (
SET fetch_size '5000' -- encourage fewer, larger batches
);
In my experience, even small changes here can flip the chosen plan from “pull everything locally and join” to “push down the join and filter remotely,” which is often exactly what I want for reporting and analytics queries.
Use FDW Options to Encourage Safe Pushdown
Cost parameters alone don’t control which operations get pushed down; each FDW has its own rules and capabilities. But I’ve had good results combining cost tuning with options that adjust how aggressively the extension attempts pushdown. The trick is to avoid making remote work look too cheap, or the planner may choose remote joins that overload a production system or ignore local indexes that would be faster.
One pattern that’s worked for me is to bias pushdown for filters and projections (which shrink row counts early), while being more conservative for complex joins or window functions unless I know the remote system handles them well. When writing or configuring a custom FDW, I’ll explicitly model that in the cost callbacks:
/* Pseudo-code for a custom FDW join cost function */
static void
myGetForeignJoinPaths(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
JoinPathExtraData *extra)
{
Cost startup_cost = 0;
Cost run_cost = 0;
/* base network + planning overhead */
startup_cost += my_network_rtt_cost();
/* penalize very large estimated result sets */
double rows = clamp_row_est(joinrel->rows);
run_cost += rows * my_tuple_cost();
/* if estimated rows are huge, discourage remote join */
if (rows > 1e7)
startup_cost *= 5; /* make remote join look less attractive */
/* ... build and add ForeignPath with calculated costs ... */
}
By encoding rules like this, I can nudge the planner toward pushing down selective joins, while keeping monster fan-out joins local where I have tighter control. To understand how these custom costs interact with the rest of the planner, it’s worth reviewing a detailed guide to PostgreSQL foreign data wrappers and planner hooks Foreign Data Wrapper API – PostgreSQL Documentation.
3. Model Join Selectivity and Remote Filters for Accurate Foreign Data Wrapper Cost Estimation
Even with good base statistics, foreign data wrapper cost estimation can still go sideways if join and filter selectivity are off. I’ve seen plans that looked fine on paper but ended up pulling 100x more rows than expected because the planner assumed uniform distributions and independence between local and remote predicates. Getting closer to reality means teaching your FDW how selective those joins and filters really are.
Use Remote Histograms and NDVs to Improve Join Selectivity
For joins between local and foreign tables, the local planner usually has no idea how keys are distributed on the remote side. In my setups, I’ve had success exporting lightweight histograms or NDV estimates for join columns from the remote system, then feeding those into custom selectivity logic or helper tables that a custom FDW can consult.
One pragmatic pattern is to store per-column NDV and basic bucket info locally, then approximate selectivity from those numbers instead of guessing:
-- Local helper table with remote column statistics
CREATE TABLE remote_column_stats (
schemaname text,
relname text,
attname text,
ndv bigint, -- approximate distinct count
most_common jsonb, -- optional: top-N frequencies
last_refresh timestamptz DEFAULT now()
);
-- Example: join selectivity helper function (very simplified)
CREATE OR REPLACE FUNCTION estimate_remote_join_sel(
p_rel text,
p_col text,
p_local_ndv bigint
) RETURNS numeric LANGUAGE plpgsql AS $$
DECLARE
r_ndv bigint;
BEGIN
SELECT ndv INTO r_ndv
FROM remote_column_stats
WHERE relname = p_rel AND attname = p_col;
IF r_ndv IS NULL OR r_ndv = 0 THEN
RETURN 0.1; -- fallback guess
END IF;
-- assume overlap based on smaller NDV
RETURN LEAST(p_local_ndv, r_ndv)::numeric / GREATEST(p_local_ndv, r_ndv);
END;
$$;
In my experience, even a crude overlap model based on NDVs beats the default assumption of independence, especially for high-cardinality keys like user IDs or order IDs that are common across systems.
Push Down Filters Early and Teach the Planner Their Real Impact
The other side of the coin is filter selectivity. Every row you filter remotely is one less row you pay to ship and process locally, so I aggressively push down predicates whenever the remote engine can execute them efficiently. The catch is that the planner needs a realistic estimate of how many rows those filters will remove, or it may still choose a plan that overfetches.
For FDWs I manage, I like to explicitly model remote filter cost and selectivity in the planning callbacks, using remote stats or sampling queries when necessary:
/* Pseudo-code inside a custom FDW planning hook */
static void
myEstimateRemoteSelectivity(RestrictInfo *rinfo,
Selectivity *sel,
Cost *filter_cost)
{
/* Start with a generic guess */
*sel = 0.1;
/* If we have remote stats for this column/operator, refine it */
if (has_remote_stats(rinfo))
*sel = lookup_remote_sel(rinfo);
/* Model additional cost for complex expressions */
if (is_expensive_expression(rinfo->clause))
*filter_cost += 10 * cpu_operator_cost;
}
One thing I learned the hard way is that underestimating filter selectivity can be just as bad as overestimating it; the planner might choose a remote hash join expecting a tiny intermediate set, then blow out memory when the filter barely trims anything. By combining remote histograms, NDVs, and simple custom selectivity functions, I’ve been able to keep those surprises to a minimum and make foreign data wrapper cost estimation much more trustworthy for real-world, mixed local/remote workloads.
4. Instrument Your FDW for Feedback-Driven Cost Estimation
Even with good statistics and hand-tuned parameters, foreign data wrapper cost estimation will drift as schemas, data volumes, and networks change. The only approach that’s held up for me long term is treating the FDW as a learning system: capture what actually happened, compare it with what the planner expected, and feed that back into your model.
Log Real Runtimes and Row Counts per Remote Operation
The first step is to instrument the FDW so every remote scan, join, or aggregate records its observed runtime and row counts. When I build or extend an FDW, I wrap the core execution paths with lightweight timing and counters, then store those in a local table for later analysis.
/* Pseudo-code inside a ForeignScan executor callback */
static TupleTableSlot *
myExecForeignScan(ForeignScanState *node)
{
instr_time start;
long rows_fetched = 0;
INSTR_START_CLOCK(start);
for (;;)
{
TupleTableSlot *slot = fetch_next_remote_row(node);
if (TupIsNull(slot))
break;
rows_fetched++;
/* return rows upstream as usual */
}
INSTR_STOP_CLOCK(start);
log_fdw_stats(node, rows_fetched, INSTR_TIME_GET_MILLISEC(start));
return NULL; /* normal end of scan */
}
On the SQL side, I maintain a simple log table. Periodically I aggregate this data to see where my estimates are consistently off and adjust costs or statistics based on hard evidence instead of intuition.
Use Feedback Loops to Auto-Adjust Cost Parameters
Once I have reliable logs, I can close the loop. For example, if a given foreign table is always returning 10x more rows than estimated, I bump its effective row_count and tuple cost; if queries are consistently faster than expected, I may dial those down.
-- Aggregate recent observations
WITH stats AS (
SELECT
relname,
avg(actual_rows)::bigint AS avg_rows,
avg(estimated_rows)::bigint AS avg_est
FROM fdw_query_log
WHERE log_time > now() - interval '1 day'
GROUP BY relname
)
UPDATE remote_table_stats r
SET row_count = s.avg_rows,
last_refreshed = now()
FROM stats s
WHERE r.relname = s.relname
AND s.avg_rows > 0
AND (s.avg_rows > 2 * s.avg_est OR s.avg_rows < 0.5 * s.avg_est);
In my experience, even a coarse-grained feedback loop like this dramatically stabilizes plans over time. Instead of chasing every regression by hand, I let the system nudge foreign data wrapper cost estimation toward reality with each production query.
5. Design Pushdown Capabilities Around the Planner, Not the Other Way Around
When I first experimented with custom pushdown in FDWs, I tried to expose every remote feature I could. That quickly backfired: the planner started generating exotic plans that looked clever but behaved unpredictably in production. I’ve since learned to flip the mindset: design pushdown capabilities to fit the planner’s model, so foreign data wrapper cost estimation stays stable and the resulting plans are boring—in a good way.
Prioritize Predictable Operators the Planner Understands Well
The PostgreSQL planner is very good at reasoning about simple operators: equality joins, range predicates, basic aggregates, and straightforward projections. I now focus my pushdown support on those operations first, even if the remote system can do much more. By keeping the remote side mostly to scans, filters, joins, and aggregates that match what the planner already understands, my cost model remains aligned with its expectations.
For example, instead of pushing down every possible window function or vendor-specific expression, I often limit pushdown to the core relational pieces and let PostgreSQL handle the rest locally. This avoids a whole class of misestimates where the planner has no good selectivity model for a complex remote expression but assumes it’s cheap anyway.
Make Pushdown Rules Explicit and Cost-Aware in the FDW
Inside the FDW, I’ve had the best results with explicit, conservative rules about which nodes are eligible for pushdown, combined with costs that reflect the planner’s worldview. Here’s a simplified example of how I gate pushdown for aggregates and joins in a custom FDW:
/* Pseudo-code: decide if a given path is safe to push down */
static bool
my_is_safe_to_pushdown(Path *path)
{
/* Only push down if all quals are shippable and immutable */
if (!all_quals_shippable(path->parent->baserestrictinfo))
return false;
/* Be conservative with GROUP BY over huge row estimates */
if (IsA(path, AggPath))
{
double rows = clamp_row_est(path->parent->rows);
if (rows > 1e8)
return false; /* keep giant aggregations local */
}
return true;
}
By encoding rules like this, I make sure the planner only sees remote paths that it can cost sensibly and that won’t surprise me later. When I add new pushdown features, I start with tight guards and then relax them gradually as I gain confidence. Aligning capabilities with the planner this way has done more for my foreign data wrapper cost estimation than any single magic parameter tweak, and resources that dive into designing FDWs specifically for PostgreSQL’s planner model can help deepen this approach Foreign Data Wrappers (FDW) in PostgreSQL Documentation.
Conclusion: A Practical Checklist for Better Foreign Data Wrapper Cost Estimation
When I review an FDW setup, I walk through the same mental checklist to keep foreign data wrapper cost estimation grounded in reality instead of guesses.
- Anchor the model in real stats: Pull row counts, NDVs, and basic distributions from the remote system; don’t rely on defaults.
- Measure the network: Translate actual latency and bandwidth into startup and tuple costs so the planner sees true transport overhead.
- Tune pushdown-friendly costs: Set FDW cost parameters to favor selective remote work (filters, joins, aggregates) without making big scans look unrealistically cheap.
- Model join and filter selectivity: Use remote histograms/NDVs and simple selectivity functions to avoid gross underestimates that cause overfetching.
- Instrument and iterate: Log real runtimes and row counts per query, compare them with estimates, and feed that back into stats and cost parameters.
In my experience, teams that follow this checklist turn FDWs from “mysterious performance wildcard” into a predictable extension of their PostgreSQL planner, with plans that are easier to reason about and debug.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





