Serialization Failure (ERROR: could not serialize access) —

Scenario

A financial transaction processing service runs all operations in SERIALIZABLE isolation to guarantee correctness. During peak hours, application logs show a 30% failure rate with SQLSTATE 40001 (serialization_failure). Retry logic exists but the retries themselves also fail, creating a thundering herd: all retried transactions restart simultaneously, conflict with each other again, and retry again — multiplying the load.

How to Identify

Conditions:

Application logs contain ERROR: could not serialize access due to concurrent update or ERROR: could not serialize access due to read/write dependencies among transactions
SQLSTATE: 40001 errors in application error tracking
pg_stat_database.conflicts counter rising (note: this tracks replica conflicts, but application-side retries indicate serialization pressure)
Error rate correlates directly with concurrent transaction count during peak hours
Retry storms visible in application metrics — retry spikes coincide with original failure spikes

Analysis Steps

-- 1. Check serialization failure rate in the database
SELECT
    datname,
    xact_commit,
    xact_rollback,
    (xact_rollback::float / NULLIF(xact_commit + xact_rollback, 0) * 100)
        AS rollback_rate_pct
FROM   pg_stat_database
WHERE  datname = current_database();

-- 2. Check current isolation level for the session
SHOW transaction_isolation;
SHOW default_transaction_isolation;

-- 3. Look for transactions currently running in SERIALIZABLE mode
SELECT
    pid,
    usename,
    state,
    left(query, 80)         AS query,
    now() - xact_start      AS xact_duration
FROM   pg_stat_activity
WHERE  state = 'active'
  AND  now() - xact_start > interval '100ms'
ORDER  BY xact_duration DESC;

-- 4. Examine pg_stat_database for conflict indicators
SELECT
    datname,
    conflicts,
    deadlocks,
    xact_rollback
FROM   pg_stat_database
WHERE  datname = current_database();

-- 5. Check if serializable_failure_on_conflict is a known pattern
-- Look at the rate of rollbacks as a proxy for serialization failures:
SELECT datname,
       xact_rollback,
       deadlocks,
       blk_read_time,
       blk_write_time
FROM   pg_stat_database
WHERE  datname = current_database();

Pitfalls

Serialization failures are normal and expected in SERIALIZABLE isolation — they are a correctness mechanism, not a database bug. The application MUST always retry transactions that fail with SQLSTATE 40001.
Do not use REPEATABLE READ assuming it prevents all anomalies — it prevents non-repeatable reads but allows write skew anomalies that SERIALIZABLE catches and rejects.
Not all 40001 errors need DBA investigation — a small, steady failure rate with successful retries is acceptable. Only high rates or thundering herds need architectural intervention.
Naive retry logic without exponential backoff causes thundering herd — all retrying transactions restart simultaneously and conflict again. Exponential backoff with jitter breaks the synchronization.
Minimizing transaction duration is the most effective mitigation — shorter transactions have shorter conflict windows and lower probability of overlapping with concurrent transactions.

Resolution Approach

Serialization failures require application-layer handling: implement exponential backoff with random jitter so retries do not thunder together. Minimize transaction duration by moving non-database work outside the transaction boundary. For operations where exact serializable isolation is not strictly required, consider READ COMMITTED with explicit SELECT FOR UPDATE locking to avoid the retry cost while maintaining correctness for specific rows.

Serialization Failure (ERROR: could not serialize access)

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page