Scenario
The analytics team runs long reports on a read-only standby replica. Queries that take 15+ minutes are being cancelled with ERROR: canceling statement due to conflict with recovery. Investigation reveals the primary is running VACUUM on tables the standby query is reading — the standby must cancel the query to apply the WAL records.
How to Identify
Conditions:
- Queries on standby cancelled with
ERROR: canceling statement due to conflict with recovery
pg_stat_database_conflicts on standby shows increasing counts
hot_standby_feedback = off (default) — standby doesn’t tell primary about active queries
- Long-running queries on standby conflict with VACUUM on primary
Analysis Steps
-- Run on STANDBY:
-- 1. Check conflict statistics
SELECT datname, confl_tablespace, confl_lock, confl_snapshot,
confl_bufferpin, confl_deadlock
FROM pg_stat_database_conflicts
WHERE datname = current_database();
-- 2. Check current max_standby_streaming_delay setting
SHOW max_standby_streaming_delay;
-- 3. Check hot_standby_feedback setting
SHOW hot_standby_feedback;
-- 4. View active long-running queries on standby
SELECT pid, now() - query_start AS duration, left(query, 80) AS query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > INTERVAL '1 minute'
ORDER BY duration DESC;
Pitfalls
- Enabling
hot_standby_feedback = on prevents conflicts by telling the primary which XIDs the standby needs — but this can cause bloat on the primary if standby queries run for a long time, because VACUUM cannot remove rows still needed by the standby.
- Increasing
max_standby_streaming_delay trades data freshness for query completion. A very high value means the standby can fall far behind the primary.
- The correct fix depends on workload: for short OLTP queries on standby, keep defaults. For long analytics queries, either enable feedback or use a dedicated analytics replica with longer delay tolerance.
Resolution Approach
Two levers: