Scenario
A financial services application uses synchronous replication to guarantee zero data loss. The configuration is synchronous_standby_names = 'standby1'. At 14:22 on a Tuesday, the standby server crashes due to a storage failure. Within seconds, every application thread that tries to write to the primary database hangs indefinitely. Hundreds of COMMIT statements queue up. The application returns HTTP 504 errors. The DBA team can still connect to the primary via psql, but every INSERT or UPDATE hangs at COMMIT. The standby is completely unreachable.
How to Identify
Conditions:
- COMMIT statements hang indefinitely and never return
pg_stat_activity shows multiple sessions with wait_event = 'SyncRep'
synchronous_standby_names is set to a standby that is down or unreachable
pg_stat_replication shows the synchronous standby as disconnected (no row, or sync_state = 'async' with no sync_state = 'sync')
- Primary is fully operational; only writes hang
- Read queries complete normally
Analysis Steps
-- 1. Identify sessions waiting for synchronous replication acknowledgement
SELECT
pid,
usename,
application_name,
state,
wait_event_type,
wait_event,
now() - query_start AS wait_duration,
left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE wait_event = 'SyncRep'
ORDER BY wait_duration DESC;
-- 2. Check synchronous replication configuration
SHOW synchronous_standby_names;
SHOW synchronous_commit;
-- 3. Check current replication connections — is the synchronous standby connected?
SELECT
client_addr,
application_name,
state,
sync_state,
replay_lag
FROM pg_stat_replication
ORDER BY sync_state;
-- 4. Verify: any standby with sync_state = 'sync'?
SELECT count(*) AS sync_standbys_connected
FROM pg_stat_replication
WHERE sync_state = 'sync';
-- If 0 and synchronous_standby_names is set → commits will block forever
-- 5. Check synchronous commit setting scope (may be set per role)
SELECT rolname, rolconfig
FROM pg_roles
WHERE rolconfig IS NOT NULL;
Pitfalls
- Single point of failure. Using
synchronous_standby_names = 'standby1' with a single standby means the primary blocks ALL commits if that standby fails. There is no automatic fallback.
- Do not disconnect replica in a panic. If the standby reconnects on its own, the block releases automatically. Investigate before dropping the standby.
pg_reload_conf() alone is not enough. Changing synchronous_standby_names or synchronous_commit requires SELECT pg_reload_conf() — but this takes effect for new transactions immediately. Existing waiting sessions will unblock.
- Quorum-based sync is safer. Using
ANY 1 (standby1, standby2) means one of two standbys must acknowledge; if one fails the other still satisfies quorum.
- Do not set
synchronous_standby_names on a single-node system. With no standbys, all commits block permanently from the first write attempt.
synchronous_commit = off per session is valid triage. Setting it at the session level while investigating does not require a server restart.
Resolution Approach
When synchronous replication commits hang because the standby is down: