Learning Pathway
04 — Advanced: Replication & HA Architecture
14 lessons- 01 Streaming Replication: walsender and walreceiver How physical streaming replication works: the walsender/walreceiver processes, the WAL streaming protocol, startup recovery replay, and replication lag.
- 02 Logical Replication and Logical Decoding How logical replication works: logical decoding of WAL into row changes, output plugins, publications and subscriptions, replica identity, and its limits.
- 03 Replication Slots: Guaranteeing WAL Retention What replication slots do: tracking consumer position, preventing premature WAL removal and row removal, the disk-fill risk, and max_slot_wal_keep_size.
- 04 Synchronous Replication and Quorum Commit How synchronous replication works: synchronous_commit levels, synchronous_standby_names, FIRST vs ANY quorum, and the durability/availability trade-off.
- 05 Hot Standby and Recovery Conflicts How hot standby serves read queries during recovery, why replay conflicts with queries arise, and how max_standby_streaming_delay and hot_standby_feedback resolve them.
- 06 WAL Archiving and Point-in-Time Recovery How PITR works: base backups plus archived WAL, archive_command, recovery_target settings, and restoring to a precise moment before a mistake.
- 07 pg_rewind: Reusing a Diverged Old Primary How pg_rewind works: timeline divergence after failover, finding changed blocks since the divergence point, and rejoining an old primary as a standby cheaply.
- 08 Automatic Failover, Fencing, and Split-Brain How automated failover works with Patroni-style tools: leader election via a DCS, the split-brain danger, fencing/STONITH, and avoiding two primaries.
- 09 Connection Routing for HA: VIPs, Pooler, and Read Scaling How clients reach the right node in an HA cluster: virtual IPs, HAProxy health checks, pooler-based routing, and splitting reads to standbys.
- 10 Cluster Managers: How Patroni and etcd Decide Who Is Primary How PostgreSQL HA works under Patroni and etcd: the leader key and TTL, Raft quorum, the watchdog fence, and the pg_promote/timeline/pg_rewind primitives they drive.
- 11 pgBackRest: How Incremental Backups and Point-in-Time Recovery Actually Work How pgBackRest works: PostgreSQL's pg_backup_start/stop API, full vs differential vs incremental vs block-incremental backups, the manifest and repository, and PITR.
- 12 Cascading Replication and Delayed Standbys: Shaping the Replica Topology How PostgreSQL cascading replication relays WAL through standby tiers, and how recovery_min_apply_delay creates a delayed standby that undoes operator error.
- 13 Failover Slots: Keeping Logical Replication Alive Across a Failover (PostgreSQL 17) How PostgreSQL 17 failover slots survive promotion: the failover flag, the slot sync worker, synchronized_standby_slots, and hot_standby_feedback.
- 14 Cross-Region Disaster Recovery: RPO, RTO, and Async Replication Over the WAN How to design PostgreSQL cross-region DR: measuring RPO from WAL lag, RTO components, async vs synchronous over the WAN, offsite archives, and switchover drills.