Cookbook recipe

pg_rewind Failing After Failover

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation4 min

Scenario

Scenario A failover was triggered: standby promoted to new primary. The old primary needs to be rewound and rejoined as a standby. pg_rewind fails with "source and target clusters are from different timelines" or "could not…

Investigation Path

Scenario

A failover was triggered: standby promoted to new primary. The old primary needs to be rewound and rejoined as a standby. pg_rewind fails with "source and target clusters are from different timelines" or "could not find common ancestor". The cluster was deployed without wal_log_hints = on and checksums disabled — pg_rewind requires at least one of these.

How to Identify

Conditions:

  • pg_rewind exits with error about missing WAL segments or diverged timeline
  • wal_log_hints = off and data checksums not enabled (pg_rewind has no way to verify block changes)
  • WAL segments needed for rewind have been recycled or archived, and no longer available locally
  • pg_controldata shows different timeline IDs on source vs target

Analysis Steps

-- On OLD PRIMARY (now stale): check control data
-- pg_controldata $PGDATA | grep -E "Timeline|checkpoint"
-- Should show: "Latest checkpoint's TimeLineID: 1" (while new primary is on TimeLineID: 2)

-- Check if wal_log_hints was enabled (required for pg_rewind without checksums)
SHOW wal_log_hints;
-- 'off' = pg_rewind cannot safely run without data checksums

-- Check if data checksums were enabled at initdb time
SELECT setting FROM pg_settings WHERE name = 'data_checksums';
-- 'off' = pg_rewind will refuse to run

-- On NEW PRIMARY: check timeline history
SELECT timeline_id, reason FROM pg_control_checkpoint();
-- Or check pg_xlog/pg_wal for .history files: 00000002.history

-- Check WAL availability on old primary
-- ls $PGDATA/pg_wal/ | grep -c "^[0-9]"
-- Needed: WAL from before divergence point

-- Verify new primary has a replication slot or WAL retained for rewind
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

Pitfalls

  • wal_log_hints = on must be set before the failure occurs — you cannot enable it retroactively on an already-diverged primary. It requires a full restart on PostgreSQL.
  • pg_rewind needs WAL from the divergence point to the current LSN. If those WAL segments are recycled, pg_rewind fails even with hints enabled.
  • pg_basebackup is the safe fallback when pg_rewind fails — it’s slower (full copy) but always works.
  • Never attempt to bring the old primary back by changing primary_conninfo alone — timeline divergence will cause data corruption on the standby.
  • After successful pg_rewind, always run a full CHECKPOINT on the new primary before starting pg_rewind — this ensures the rewind target has the latest state.

Resolution Approach

If pg_rewind fails: fall back to pg_basebackup from the new primary to rebuild the old primary as a fresh standby. For future protection, enable wal_log_hints = on or use initdb --data-checksums so pg_rewind can work after the next failover.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Mitigation Actions
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes