Scenario
A write-heavy OLTP system experiences periodic write throughput drops every 5 minutes. The PostgreSQL log shows LOG: checkpoint complete: wrote N buffers with warnings like LOG: checkpoints are occurring too frequently (N seconds apart). pg_stat_bgwriter shows a high checkpoints_req count (requested checkpoints) relative to checkpoints_timed (scheduled checkpoints). The DBA suspects max_wal_size is too small.
How to Identify
Conditions:
- PostgreSQL log shows
checkpoints are occurring too frequently
pg_stat_bgwriter.checkpoints_req >> checkpoints_timed (requested >> timed)
max_wal_size is too small for the WAL generation rate between checkpoints
checkpoint_completion_target is too low — checkpoints complete too aggressively
- Write throughput drops periodically due to checkpoint I/O pressure
Analysis Steps
-- Check checkpoint statistics
SELECT
checkpoints_timed,
checkpoints_req,
checkpoint_write_time / 1000.0 AS write_time_seconds,
checkpoint_sync_time / 1000.0 AS sync_time_seconds,
buffers_checkpoint,
buffers_clean,
buffers_backend,
maxwritten_clean,
buffers_alloc
FROM pg_stat_bgwriter;
-- checkpoints_req >> checkpoints_timed = too frequent forced checkpoints
-- Check current checkpoint settings
SHOW max_wal_size;
SHOW checkpoint_completion_target;
SHOW checkpoint_timeout;
SHOW min_wal_size;
-- Check WAL generation rate
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'::pg_lsn)) AS total_wal_generated;
-- Run this query twice 1 minute apart and compute the difference for MB/s rate
-- Check if bgwriter is keeping up
SELECT maxwritten_clean, buffers_clean, buffers_backend
FROM pg_stat_bgwriter;
-- buffers_backend > 0: backends are writing dirty pages directly (bgwriter overwhelmed)
-- maxwritten_clean > 0: bgwriter hit bgwriter_lru_maxpages limit (increase it)
Pitfalls
- Increasing
max_wal_size delays checkpoints but increases crash recovery time. Balance is key: 2-5 minutes of WAL between checkpoints for most OLTP workloads.
checkpoint_completion_target = 0.9 spreads checkpoint I/O over 90% of checkpoint_timeout. Lower values cause checkpoint I/O spikes that hurt query latency.
checkpoints_req counts WAL-triggered checkpoints (WAL filled up before checkpoint_timeout). This is the problem indicator.
buffers_backend (direct writes by backends, bypassing bgwriter) indicates the bgwriter cannot keep up — increase bgwriter_lru_maxpages and bgwriter_delay.
min_wal_size should be 10-25% of max_wal_size — too small means PostgreSQL recycles WAL files constantly.
Resolution Approach
Increase max_wal_size until checkpoints_req drops close to zero. Set checkpoint_completion_target = 0.9 to spread I/O. Monitor buffers_backend — if still high, tune bgwriter_lru_maxpages and bgwriter_delay as well.