Cookbook recipe

Cascading Replication Lag Amplification

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation4 min

Scenario

Scenario The architecture has three nodes: primary → standby1 (HA) → standby2 (DR/analytics). standby1 shows 2-second lag from the primary. standby2 shows 12-minute lag from standby1. The DR team assumed standby2 would mirror standby1's lag, but…

Investigation Path

Scenario

The architecture has three nodes: primary → standby1 (HA) → standby2 (DR/analytics). standby1 shows 2-second lag from the primary. standby2 shows 12-minute lag from standby1. The DR team assumed standby2 would mirror standby1’s lag, but cascading lag is additive — standby2’s lag = standby1’s lag + its own network/apply lag.

How to Identify

Conditions:

  • Cascading standby lag significantly higher than direct standby lag
  • pg_stat_replication on standby1 shows large replay_lag for standby2
  • Network RTT between tiers adds to lag at each hop
  • recovery_min_apply_delay may be intentionally set on downstream standby

Analysis Steps

-- On PRIMARY: view all connected standbys
SELECT
    application_name,
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    write_lag,
    flush_lag,
    replay_lag,
    sync_state
FROM pg_stat_replication
ORDER BY application_name;

-- On STANDBY1: view standbys connected to it (cascading)
-- Same query — shows standby2's lag relative to standby1

-- On STANDBY2: check its own recovery settings
SHOW recovery_min_apply_delay;
SHOW primary_conninfo;
-- primary_conninfo should point to standby1, not the primary

Pitfalls

  • Cascading replication lag is additive — standby2 total lag = standby1’s lag + standby2’s own lag.
  • recovery_min_apply_delay adds intentional lag (used for delayed DR). Do not confuse intentional delay with pathological lag.
  • Promoting a cascading standby without redirecting its own downstream standbys leaves orphaned standbys that lose their WAL source.
  • High-priority read replicas (analytics, reporting) should connect directly to the primary if low lag is critical — not through a cascade.

Resolution Approach

For latency-sensitive replicas, connect directly to the primary. Reserve cascading topology for DR standbys where some lag is acceptable. Monitor all tiers separately in your monitoring system.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Mitigation Actions
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes