Cookbook recipe

Poor Planner Estimates / Wrong Join Order

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation4 min

Scenario

Scenario A reporting query joining three tables runs fine on a test database with 10,000 rows but takes 4 minutes on production with 50 million rows. EXPLAIN ANALYZE shows the planner estimated 12 rows for an…

Investigation Path

Scenario

A reporting query joining three tables runs fine on a test database with 10,000 rows but takes 4 minutes on production with 50 million rows. EXPLAIN ANALYZE shows the planner estimated 12 rows for an intermediate result that actually returned 890,000 rows.

How to Identify

Conditions:

  • EXPLAIN ANALYZE shows large discrepancy between rows=X (estimate) and actual rows=Y
  • Slow join queries that were fast before a bulk data load
  • pg_stats shows very few distinct values for join columns
  • Planner choosing Nested Loop when Hash Join or Merge Join would be correct
  • pg_stat_user_tables.last_autoanalyze is old relative to table churn

Analysis Steps

-- 1. Compare estimated vs actual rows
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders o
JOIN   customers c ON c.id = o.customer_id
JOIN   regions r   ON r.id = c.region_id
WHERE  r.name = 'EU';

-- 2. Check statistics for relevant columns
SELECT tablename, attname, n_distinct, correlation,
       most_common_vals, most_common_freqs
FROM   pg_stats
WHERE  tablename IN ('orders', 'customers', 'regions')
  AND  attname IN ('customer_id', 'region_id', 'name');

-- 3. Check when statistics were last collected
SELECT relname, last_analyze, last_autoanalyze, n_live_tup
FROM   pg_stat_user_tables
WHERE  relname IN ('orders', 'customers', 'regions');

-- 4. Check if extended statistics exist for correlated columns
SELECT stxname, stxkeys, stxkind
FROM   pg_statistic_ext
WHERE  stxrelid = 'customers'::regclass;

Pitfalls

  • The planner assumes column independence by default. Two columns that are functionally related (e.g., city and zip_code) will produce severely wrong estimates if treated independently.
  • Do not force join order with join_collapse_limit = 1 without understanding data distribution — it may make other queries worse.
  • Statistics freshness matters: check when ANALYZE last ran, not just whether it ran.
  • Do not confuse n_distinct = -0.1 (meaning 10% of rows are distinct) with n_distinct = 10 (meaning exactly 10 distinct values). Signs matter here.

Resolution Approach

Bad estimates come from two sources:

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Mitigation Actions
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes