Split-Brain After Network Partition

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

Estimated investigation4 min

Investigation Path

Scenario

A 30-second network partition between the primary and standby causes the automatic failover system (Patroni/repmgr) to promote the standby. When the network recovers, both nodes are now accepting writes — the original primary continued accepting writes during the partition, and the newly promoted standby also accepted writes. This is a split-brain scenario. Data written to each node is now divergent.

How to Identify

Conditions:

Both servers return pg_is_in_recovery() = false
pg_current_wal_lsn() is different on both nodes after the partition
WAL timeline on the new primary is higher than the old primary’s timeline
Application writes went to different nodes simultaneously
Fencing (STONITH) was not in place or failed

Analysis Steps

-- On NODE 1 (original primary): check status
SELECT pg_is_in_recovery()      AS in_recovery,
       pg_current_wal_lsn()     AS current_lsn,
       timeline_id               AS timeline
FROM pg_control_checkpoint();

-- On NODE 2 (promoted standby): check status
SELECT pg_is_in_recovery()      AS in_recovery,
       pg_current_wal_lsn()     AS current_lsn,
       timeline_id               AS timeline
FROM pg_control_checkpoint();

-- If both return in_recovery=false: split-brain confirmed
-- The node with higher timeline_id is the "correct" new primary

-- On original primary: check if any writes happened during the partition
-- (look for transactions with XID > the last replicated XID)
SELECT pg_current_wal_lsn() - pg_control_checkpoint().checkpoint_lsn AS wal_since_promotion;

-- On new primary: check timeline history
-- ls $PGDATA/pg_wal/*.history   ← shows when and why promotion happened

Pitfalls

Split-brain is caused by lack of fencing: the old primary should be immediately and forcibly shut down (STONITH — “Shoot The Other Node In The Head”) when a failover is triggered.
Automatic failover without fencing is dangerous in any HA system — not just PostgreSQL.
synchronous_commit = on with a synchronous standby prevents data loss during failover, but doesn’t prevent split-brain if fencing fails.
Patroni uses distributed consensus (etcd/Consul/ZooKeeper) and DCS-based leader locks to prevent split-brain — this is why DCS is required.
After a split-brain, data reconciliation is a manual, complex process. Some writes from the old primary will be lost.

Resolution Approach

Immediately shut down the old primary (forcibly if necessary). Identify the divergence point using WAL comparison. Accept that writes to the old primary after the divergence point are lost. Optionally: manually extract diverged writes from the old primary’s WAL using pg_waldump for reconciliation.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

Mitigation Actions

All 36 Learning Pathway lessons
170+ cookbook recipes
Source-grounded diagnostics & fixes

Unlock everything — $24.99/month $199/year — save 35%

Secure checkout Cancel anytime Source-grounded

Already a member? Log in · New here? Create a free account

Split-Brain After Network Partition

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page