Scenario
A monitoring alert fires: the standby’s replay lag keeps increasing. The DBA connects to the standby and finds pg_is_in_recovery() = true and pg_is_wal_replay_paused() = true. The replay timestamp is 2 hours behind the primary. The standby is not catching up. Investigation reveals that a recovery_target_time was set on the standby from a PITR exercise 2 weeks ago and never removed — the standby paused when it replayed past that target time and has been waiting ever since.
How to Identify
Conditions:
pg_is_wal_replay_paused() = true on standby
pg_last_xact_replay_timestamp() is hours or days behind current time
recovery_target_time is set in postgresql.conf or postgresql.auto.conf
pg_stat_replication on primary shows standby with very large replay_lag
- Standby is receiving WAL (received_lsn advancing) but not applying it
Analysis Steps
-- ON STANDBY: check pause state
SELECT
pg_is_in_recovery() AS in_recovery,
pg_is_wal_replay_paused() AS replay_paused,
pg_get_wal_replay_pause_state() AS pause_state,
pg_last_wal_receive_lsn() AS received,
pg_last_wal_replay_lsn() AS replayed,
pg_wal_lsn_diff(
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn()
) AS unaplied_bytes,
pg_last_xact_replay_timestamp() AS last_applied_txn;
-- received != replayed AND replay_paused=true = paused at target
-- Check recovery settings
SELECT name, setting FROM pg_settings
WHERE name LIKE 'recovery%'
ORDER BY name;
-- ON PRIMARY: check standby's lag
SELECT application_name, received_lsn, replay_lsn, replay_lag
FROM pg_stat_replication
WHERE application_name = 'standby1';
Pitfalls
recovery_target_time left in postgresql.conf on a streaming standby causes it to pause when the primary’s WAL passes that timestamp. This is the most common cause of unexpected standby lag.
pg_wal_replay_resume() resumes replay AND promotes the standby — if the standby should stay as a standby (not promote), use pg_wal_replay_pause()/resume() carefully.
- On a streaming standby (not PITR), there should be NO
recovery_target_* settings. These are for PITR only.
- After calling
pg_wal_replay_resume() on a standby: if there is no more WAL to apply and the recovery target was meant for PITR, the standby will promote. Verify intent before calling.
pg_promote(wait_seconds) explicitly promotes. pg_wal_replay_resume() resumes replay but may or may not promote depending on WAL availability.
Resolution Approach
Remove all recovery_target_* settings from the standby’s configuration. Call pg_wal_replay_resume() to resume replay. The standby will catch up and continue as a streaming standby without promoting (if WAL is still being received from the primary).