Scenario
An e-commerce application processes orders and updates inventory simultaneously. Starting at peak hours, the application logs show bursts of ERROR: deadlock detected with transaction rollbacks. Customers see failed checkout errors. The engineering team has retry logic but the retries are also failing, creating a storm.
How to Identify
Conditions:
- PostgreSQL logs contain
ERROR: deadlock detected with DETAIL: Process X waits for ShareLock on transaction Y; blocked by process Z
- Application receiving
SQLSTATE: 40P01 errors
pg_stat_activity shows processes in Lock wait state
pg_locks shows two processes each waiting for a lock held by the other
- Retry storms in application metrics correlating with peak load
Analysis Steps
-- 1. Check PostgreSQL logs for deadlock events
-- grep "deadlock detected" $PGDATA/log/postgresql*.log
-- The log shows both transactions involved and the lock cycle
-- 2. Find currently blocked processes
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
blocked.wait_event_type,
blocked.wait_event
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';
-- 3. Inspect the full lock graph
SELECT
l.pid,
l.locktype,
l.relation::regclass,
l.mode,
l.granted,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
ORDER BY l.relation, l.pid;
-- 4. Check deadlock_timeout setting
SHOW deadlock_timeout;
-- Default: 1s. PostgreSQL detects deadlock after this delay.
Pitfalls
- PostgreSQL automatically resolves deadlocks by aborting one of the transactions — do not manually kill processes while deadlock detection is happening.
- The aborted transaction receives
ERROR 40P01 — the application must retry. If it doesn’t, the user sees a failed operation.
- Do not set
deadlock_timeout too low (e.g., 10ms) — it causes excessive deadlock checking overhead on every lock wait.
- The root cause is almost always inconsistent lock acquisition order in application code — multiple transactions locking the same rows in different orders.
- Log analysis is essential — PostgreSQL logs show which queries were involved.
log_lock_waits = on helps capture these events.
Resolution Approach
Deadlocks are an application design problem, not a database problem. The fix is in the application code: ensure all transactions acquire locks on multiple rows in a consistent order (e.g., always by id ASC). Use SELECT FOR UPDATE to pre-lock rows in the correct order before modifying them.