Scenario
A failover was triggered: standby promoted to new primary. The old primary needs to be rewound and rejoined as a standby. pg_rewind fails with "source and target clusters are from different timelines" or "could not find common ancestor". The cluster was deployed without wal_log_hints = on and checksums disabled — pg_rewind requires at least one of these.
How to Identify
Conditions:
pg_rewind exits with error about missing WAL segments or diverged timeline
wal_log_hints = off and data checksums not enabled (pg_rewind has no way to verify block changes)
- WAL segments needed for rewind have been recycled or archived, and no longer available locally
pg_controldata shows different timeline IDs on source vs target
Analysis Steps
-- On OLD PRIMARY (now stale): check control data
-- pg_controldata $PGDATA | grep -E "Timeline|checkpoint"
-- Should show: "Latest checkpoint's TimeLineID: 1" (while new primary is on TimeLineID: 2)
-- Check if wal_log_hints was enabled (required for pg_rewind without checksums)
SHOW wal_log_hints;
-- 'off' = pg_rewind cannot safely run without data checksums
-- Check if data checksums were enabled at initdb time
SELECT setting FROM pg_settings WHERE name = 'data_checksums';
-- 'off' = pg_rewind will refuse to run
-- On NEW PRIMARY: check timeline history
SELECT timeline_id, reason FROM pg_control_checkpoint();
-- Or check pg_xlog/pg_wal for .history files: 00000002.history
-- Check WAL availability on old primary
-- ls $PGDATA/pg_wal/ | grep -c "^[0-9]"
-- Needed: WAL from before divergence point
-- Verify new primary has a replication slot or WAL retained for rewind
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
Pitfalls
wal_log_hints = on must be set before the failure occurs — you cannot enable it retroactively on an already-diverged primary. It requires a full restart on PostgreSQL.
pg_rewind needs WAL from the divergence point to the current LSN. If those WAL segments are recycled, pg_rewind fails even with hints enabled.
pg_basebackup is the safe fallback when pg_rewind fails — it’s slower (full copy) but always works.
- Never attempt to bring the old primary back by changing
primary_conninfo alone — timeline divergence will cause data corruption on the standby.
- After successful
pg_rewind, always run a full CHECKPOINT on the new primary before starting pg_rewind — this ensures the rewind target has the latest state.
Resolution Approach
If pg_rewind fails: fall back to pg_basebackup from the new primary to rebuild the old primary as a fresh standby. For future protection, enable wal_log_hints = on or use initdb --data-checksums so pg_rewind can work after the next failover.