pg_rewind Failing After Failover

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

Estimated investigation4 min

Investigation Path

Scenario

A failover was triggered: standby promoted to new primary. The old primary needs to be rewound and rejoined as a standby. pg_rewind fails with "source and target clusters are from different timelines" or "could not find common ancestor". The cluster was deployed without wal_log_hints = on and checksums disabled — pg_rewind requires at least one of these.

How to Identify

Conditions:

pg_rewind exits with error about missing WAL segments or diverged timeline
wal_log_hints = off and data checksums not enabled (pg_rewind has no way to verify block changes)
WAL segments needed for rewind have been recycled or archived, and no longer available locally
pg_controldata shows different timeline IDs on source vs target

Analysis Steps

-- On OLD PRIMARY (now stale): check control data
-- pg_controldata $PGDATA | grep -E "Timeline|checkpoint"
-- Should show: "Latest checkpoint's TimeLineID: 1" (while new primary is on TimeLineID: 2)

-- Check if wal_log_hints was enabled (required for pg_rewind without checksums)
SHOW wal_log_hints;
-- 'off' = pg_rewind cannot safely run without data checksums

-- Check if data checksums were enabled at initdb time
SELECT setting FROM pg_settings WHERE name = 'data_checksums';
-- 'off' = pg_rewind will refuse to run

-- On NEW PRIMARY: check timeline history
SELECT timeline_id, reason FROM pg_control_checkpoint();
-- Or check pg_xlog/pg_wal for .history files: 00000002.history

-- Check WAL availability on old primary
-- ls $PGDATA/pg_wal/ | grep -c "^[0-9]"
-- Needed: WAL from before divergence point

-- Verify new primary has a replication slot or WAL retained for rewind
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

Pitfalls

wal_log_hints = on must be set before the failure occurs — you cannot enable it retroactively on an already-diverged primary. It requires a full restart on PostgreSQL.
pg_rewind needs WAL from the divergence point to the current LSN. If those WAL segments are recycled, pg_rewind fails even with hints enabled.
pg_basebackup is the safe fallback when pg_rewind fails — it’s slower (full copy) but always works.
Never attempt to bring the old primary back by changing primary_conninfo alone — timeline divergence will cause data corruption on the standby.
After successful pg_rewind, always run a full CHECKPOINT on the new primary before starting pg_rewind — this ensures the rewind target has the latest state.

Resolution Approach

If pg_rewind fails: fall back to pg_basebackup from the new primary to rebuild the old primary as a fresh standby. For future protection, enable wal_log_hints = on or use initdb --data-checksums so pg_rewind can work after the next failover.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

Mitigation Actions

All 36 Learning Pathway lessons
170+ cookbook recipes
Source-grounded diagnostics & fixes

Unlock everything — $24.99/month $199/year — save 35%

Secure checkout Cancel anytime Source-grounded

Already a member? Log in · New here? Create a free account

pg_rewind Failing After Failover

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page