The standby is falling behind the primary — replay_lag is growing and your RPO window is at risk.
Diagnose it
-- Lag by standby (run on the primary):
SELECT application_name,
client_addr,
state,
sync_state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
write_lag,
flush_lag,
replay_lag,
pg_size_pretty(
pg_wal_lsn_diff(sent_lsn, replay_lsn)
) AS bytes_behind
FROM pg_stat_replication
ORDER BY replay_lag DESC NULLS LAST;
write_lag is the round-trip network time; flush_lag adds fsync time
on the standby; replay_lag adds WAL apply time. All three were added in
PostgreSQL 10.
Why it happens
Lag can accumulate from: (1) network saturation — more WAL is being generated than the
link can transfer; (2) standby I/O bottleneck — flush_lag and replay_lag
are high while write_lag is low; (3) a long-running transaction or vacuum on the
standby causing max_standby_streaming_delay to kick in and pause WAL apply;
(4) a query conflict on the standby causing a pause (check the standby’s
pg_stat_database_conflicts).