Cookbook recipe

Autovacuum Killing Performance (I/O Spikes)

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation5 min

Scenario

Scenario A production PostgreSQL cluster handling an e-commerce platform experiences periodic I/O spikes every 2–3 hours. During these spikes, query P95 latency rises from 5ms to 200ms for about 10 minutes. No new deployments correspond with…

Investigation Path

Scenario

A production PostgreSQL cluster handling an e-commerce platform experiences periodic I/O spikes every 2–3 hours. During these spikes, query P95 latency rises from 5ms to 200ms for about 10 minutes. No new deployments correspond with the pattern. A DBA checks pg_stat_activity during a spike and sees several autovacuum worker processes actively vacuuming large tables.

How to Identify

Conditions:

  • Periodic I/O spikes correlating with autovacuum worker activity in pg_stat_activity
  • Query latency spikes during autovacuum runs visible in monitoring dashboards
  • pg_stat_bgwriter shows high buffers_clean activity during the same window
  • pg_stat_progress_vacuum shows autovacuum workers processing large tables
  • autovacuum_vacuum_cost_limit is set to a high value or default; autovacuum_vacuum_cost_delay is very low

Analysis Steps

-- 1. Identify autovacuum workers currently running and their targets
SELECT pid,
       left(query, 80)               AS query,
       now() - query_start           AS duration,
       wait_event_type,
       wait_event
FROM   pg_stat_activity
WHERE  backend_type = 'autovacuum worker'
ORDER  BY query_start;

-- 2. Monitor vacuum progress to see how much I/O is being done
SELECT p.relid::regclass             AS table_name,
       p.phase,
       p.heap_blks_total,
       p.heap_blks_scanned,
       p.heap_blks_vacuumed,
       round(p.heap_blks_vacuumed::numeric /
             nullif(p.heap_blks_total, 0) * 100, 1) AS pct_done,
       p.num_dead_tuples
FROM   pg_stat_progress_vacuum p;

-- 3. Check current cost throttling configuration
SHOW autovacuum_vacuum_cost_delay;
SHOW autovacuum_vacuum_cost_limit;
SHOW vacuum_cost_page_hit;
SHOW vacuum_cost_page_miss;
SHOW vacuum_cost_page_dirty;

-- 4. Check bgwriter I/O statistics (reset to baseline then re-check after spike)
SELECT buffers_clean,
       maxwritten_clean,
       buffers_backend,
       buffers_backend_fsync,
       buffers_alloc,
       now() - stats_reset AS stats_age
FROM   pg_stat_bgwriter;

-- 5. Identify tables that trigger large autovacuum runs
SELECT relname,
       n_dead_tup,
       last_autovacuum,
       pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM   pg_stat_user_tables
WHERE  n_dead_tup > 10000
  OR   pg_relation_size(relid) > 1073741824    -- tables > 1 GB
ORDER  BY pg_relation_size(relid) DESC
LIMIT  20;

Pitfalls

  • Throttling autovacuum too aggressively (raising cost_delay, lowering cost_limit) reduces I/O during vacuum but allows dead tuples and bloat to accumulate. This leads to even larger, more disruptive vacuum runs later — or worse, transaction ID wraparound.
  • Disabling autovacuum on busy tables to avoid I/O spikes is dangerous. Dead tuples accumulate and bloat grows unchecked.
  • The real fix is reducing vacuum work per session, not eliminating vacuum. More frequent small vacuum runs are less impactful than infrequent large ones.
  • autovacuum_vacuum_cost_delay in PostgreSQL 14 defaults to 2ms. In older versions (PG13 and earlier) it defaulted to 20ms. If you upgraded and the default changed, autovacuum may now be much more aggressive.
  • Scheduling manual VACUUM during off-hours while keeping autovacuum enabled is a valid strategy, but autovacuum should still remain active to handle unexpected churn.

Resolution Approach

The goal is to spread vacuum I/O over time rather than allowing large bursts:

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Mitigation Actions
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes