The one thing to understand first
PostgreSQL knows how to be a primary or a standby, but it has no opinion about which node should be primary right now. That decision — and the dangerous job of making it exactly once, with no two nodes ever believing they are primary at the same time — is delegated to an external cluster manager. The most common pairing is Patroni (an agent running beside each PostgreSQL) plus a distributed consensus store like etcd. This lesson builds on the earlier failover-and-fencing concepts and shows the real machinery that turns them into an automated, split-brain-safe cluster.
This extends Pathway 04’s lessons on streaming replication, synchronous commit, pg_rewind, and the failover/fencing concept — here we wire those PostgreSQL primitives into a working orchestration layer.
The mechanism: a leader key with a heartbeat
Patroni runs one agent per node, looping every loop_wait seconds (default 10). The cluster’s truth lives not in any PostgreSQL but in the DCS: etcd holds a leader key with a TTL (default ttl = 30s). The primary’s Patroni renews that key on every loop. If the primary stalls and the key is not renewed before the TTL expires, etcd lets it vanish — and the surviving replicas’ agents race to recreate it with an atomic compare-and-set. Exactly one wins, and only the winner promotes. The whole correctness of the system rests on that single atomic operation in a consistent store.