Scenario
A job queue table processes tasks using multiple workers, each selecting the next available row with SELECT FOR UPDATE. As the team scales from 4 to 40 workers, throughput drops instead of scaling linearly. pg_stat_activity shows the majority of workers in a Lock wait state, all trying to lock the same small set of “hot” rows at the head of the queue. Workers are spending more time waiting for each other than doing actual work.
How to Identify
Conditions:
pg_stat_activity shows many sessions with wait_event_type = 'Lock' and wait_event = 'relation' or 'tuple'
pg_locks shows numerous ForUpdate row-level lock waits on the same table
- Throughput is flat or declining as worker count increases (inverse scaling)
- Lock wait times visible in
pg_stat_activity are longer than job processing times
- High
Heap Fetches in EXPLAIN (ANALYZE, BUFFERS) output for the SELECT FOR UPDATE query
Analysis Steps
-- 1. Find sessions waiting for row-level locks
SELECT
pid,
usename,
wait_event_type,
wait_event,
now() - query_start AS wait_duration,
left(query, 100) AS query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
AND state = 'active'
ORDER BY wait_duration DESC;
-- 2. Inspect row-level lock types in pg_locks
SELECT
l.pid,
l.locktype,
l.relation::regclass AS table_name,
l.mode,
l.granted,
l.transactionid,
l.tuple,
left(a.query, 80) AS query
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE l.locktype IN ('tuple', 'relation', 'transactionid')
ORDER BY l.granted, l.locktype;
-- 3. Count workers blocked vs. active on the queue table
SELECT
granted,
count(*) AS lock_count,
mode
FROM pg_locks
WHERE relation = 'task_queue_p020'::regclass
GROUP BY granted, mode
ORDER BY granted DESC;
-- 4. Identify the queue head (hot rows being contended)
SELECT id, status, created_at, locked_by
FROM task_queue_p020
WHERE status = 'pending'
ORDER BY id
LIMIT 10;
-- 5. Check wait ratio (high = severe contention)
SELECT
count(*) FILTER (WHERE wait_event_type = 'Lock') AS waiting,
count(*) FILTER (WHERE state = 'active') AS active,
count(*) AS total
FROM pg_stat_activity
WHERE usename NOT IN ('postgres');
Pitfalls
- Row-level locks cause heap page writes (row marking visible as heap fetches) — they are more expensive than they appear. Under high contention, this write amplification is significant.
- High concurrency on the same rows cannot be solved with indexes — more indexes do not help; the contention is on the row itself, not the access path.
SELECT FOR UPDATE SKIP LOCKED fundamentally changes semantics — workers get different rows rather than the single “next” row. This is intentional for queue patterns but changes fairness and ordering guarantees.
- Do not assume more workers = more throughput — for queue-style workloads with hot rows, there is a point beyond which adding workers actively reduces throughput due to lock overhead.
SELECT FOR UPDATE on large result sets with subsequent filtering is dangerous — it locks all matched rows even if only a subset is actually processed.
Resolution Approach
Replace SELECT FOR UPDATE with SELECT FOR UPDATE SKIP LOCKED so each worker immediately claims a different available row instead of queuing behind the same hot row. This eliminates inter-worker lock contention entirely and allows throughput to scale with worker count. For cases where strict ordering is required, partition the queue by a shard key so workers operate on non-overlapping row sets.