Lesson 8 of 14

Automatic Failover, Fencing, and Split-Brain

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

The one thing to understand first

Streaming replication gives you standbys ready to take over, but deciding when to promote one and ensuring only one primary exists is the hard part. Doing it by hand is slow and error-prone. Tools like Patroni, repmgr, and Stolon automate failover — but automation introduces the cluster’s most dangerous failure mode: split-brain.

Automated failover is not really about promoting a standby quickly — it is about guaranteeing only one primary ever exists. Every mechanism here (<a class="sev1-termlink" href="https://thesev1database.com/glossary/leader-key-ttl/" title="Leader key (TTL lease)">leader lock, self-demotion, fencing, quorum) is in service of preventing two nodes from both accepting writes.

The split-brain nightmare

Split-brain is when two nodes both believe they are the primary and both accept writes. Clients write conflicting data to each; when the network heals, there is no safe way to merge the divergent histories — you have irreconcilable data loss. Every automated failover design exists primarily to prevent split-brain, not merely to speed up promotion.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • The DCS and leader election
  • Why the old primary must demote itself
  • Fencing and STONITH
  • Quorum prevents minority promotion
  • Layer 3 — Watch it happen on your own database
  • Layer 4 — The levers this hands you
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Was this helpful?

← Back to 04 — Advanced: Replication & HA Architecture