IOPS Saturation During Checkpoint —

Scenario

A production OLTP database on an HDD-based storage array shows a perfectly periodic pattern: every 5 minutes, query latency spikes from 3ms to 400-500ms for exactly 20-30 seconds. The DBA correlates the spikes with PostgreSQL checkpoint activity. During the spike, pg_stat_activity shows backends waiting in wait_event = 'BufferIO' and wait_event = 'DataFileWrite'. The checkpoint is writing 8 GB of dirty data in 30 seconds, saturating the storage IOPS.

How to Identify

Conditions:

Periodic latency spikes correlating with checkpoint timing
pg_stat_bgwriter.buffers_checkpoint very high (large checkpoint writes)
pg_stat_bgwriter.buffers_clean very low (bgwriter not writing ahead of checkpoints)
Backends waiting in BufferIO or DataFileWrite wait events during spikes
checkpoint_completion_target low (checkpoints complete too quickly)
Storage IOPS maxed out during checkpoint (visible in iostat)

Analysis Steps

-- Check checkpoint statistics
SELECT
    checkpoints_timed,
    checkpoints_req,
    checkpoint_write_time / 1000.0      AS write_time_sec,
    checkpoint_sync_time  / 1000.0      AS sync_time_sec,
    buffers_checkpoint,
    buffers_clean,
    buffers_backend,
    maxwritten_clean
FROM pg_stat_bgwriter;

-- Check checkpoint timing settings
SHOW checkpoint_timeout;           -- how often checkpoints happen
SHOW checkpoint_completion_target; -- how fast checkpoint writes run
SHOW max_wal_size;

-- Calculate checkpoint write rate:
-- If checkpoint_timeout=5min and checkpoint_completion_target=0.5 (50%):
-- Checkpoint must write all dirty pages in 2.5 minutes → I/O burst

-- With checkpoint_completion_target=0.9:
-- Checkpoint has 4.5 minutes to write → I/O spread out

-- Check bgwriter efficiency
SELECT
    CASE WHEN buffers_checkpoint + buffers_clean > 0
         THEN round(100.0 * buffers_clean / (buffers_checkpoint + buffers_clean), 1)
         ELSE 0
    END AS pct_written_by_bgwriter
FROM pg_stat_bgwriter;
-- Low % = bgwriter is not writing pages ahead of checkpoint → checkpoint does all the work

Pitfalls

checkpoint_completion_target = 0.5 (old PostgreSQL default) causes checkpoints to complete in 50% of checkpoint_timeout — creating 30-second I/O bursts every 5 minutes.
Increasing shared_buffers without increasing checkpoint_completion_target or bgwriter_lru_maxpages makes the problem worse: more dirty pages accumulate and are written in a shorter burst.
Moving from HDD to SSD completely changes this problem — SSDs handle checkpoint bursts without latency spikes. The fix on HDD is to spread I/O; the fix on SSD is usually to increase shared_buffers and accept checkpoint bursts.
effective_io_concurrency affects how aggressively PostgreSQL does read-ahead I/O, but checkpoints are about write throughput, not read throughput.

Resolution Approach

Set checkpoint_completion_target = 0.9 to spread checkpoint writes over 90% of the checkpoint interval. Tune bgwriter to write dirty pages proactively before the checkpoint, reducing the burst. On HDD: also consider increasing checkpoint_timeout to allow more time for I/O spreading.

IOPS Saturation During Checkpoint

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page