Poor Planner Estimates / Wrong Join Order

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

Estimated investigation4 min

Investigation Path

Scenario

A reporting query joining three tables runs fine on a test database with 10,000 rows but takes 4 minutes on production with 50 million rows. EXPLAIN ANALYZE shows the planner estimated 12 rows for an intermediate result that actually returned 890,000 rows.

How to Identify

Conditions:

EXPLAIN ANALYZE shows large discrepancy between rows=X (estimate) and actual rows=Y
Slow join queries that were fast before a bulk data load
pg_stats shows very few distinct values for join columns
Planner choosing Nested Loop when Hash Join or Merge Join would be correct
pg_stat_user_tables.last_autoanalyze is old relative to table churn

Analysis Steps

-- 1. Compare estimated vs actual rows
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders o
JOIN   customers c ON c.id = o.customer_id
JOIN   regions r   ON r.id = c.region_id
WHERE  r.name = 'EU';

-- 2. Check statistics for relevant columns
SELECT tablename, attname, n_distinct, correlation,
       most_common_vals, most_common_freqs
FROM   pg_stats
WHERE  tablename IN ('orders', 'customers', 'regions')
  AND  attname IN ('customer_id', 'region_id', 'name');

-- 3. Check when statistics were last collected
SELECT relname, last_analyze, last_autoanalyze, n_live_tup
FROM   pg_stat_user_tables
WHERE  relname IN ('orders', 'customers', 'regions');

-- 4. Check if extended statistics exist for correlated columns
SELECT stxname, stxkeys, stxkind
FROM   pg_statistic_ext
WHERE  stxrelid = 'customers'::regclass;

Pitfalls

The planner assumes column independence by default. Two columns that are functionally related (e.g., city and zip_code) will produce severely wrong estimates if treated independently.
Do not force join order with join_collapse_limit = 1 without understanding data distribution — it may make other queries worse.
Statistics freshness matters: check when ANALYZE last ran, not just whether it ran.
Do not confuse n_distinct = -0.1 (meaning 10% of rows are distinct) with n_distinct = 10 (meaning exactly 10 distinct values). Signs matter here.

Resolution Approach

Bad estimates come from two sources:

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

Mitigation Actions

All 36 Learning Pathway lessons
170+ cookbook recipes
Source-grounded diagnostics & fixes

Unlock everything — $24.99/month $199/year — save 35%

Secure checkout Cancel anytime Source-grounded

Already a member? Log in · New here? Create a free account

Poor Planner Estimates / Wrong Join Order

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page