WAL Disk Full

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

Estimated investigation4 min

Investigation Path

Scenario

At 3 AM, all write operations on the database start hanging. The on-call engineer checks disk usage and finds /var/lib/postgresql/14/main/pg_wal at 100%. PostgreSQL is alive (can query pg_stat_activity) but all INSERTs/UPDATEs are blocked in wait_event = 'WALWrite'. The pg_wal directory has grown to 50 GB; wal_keep_size = 4096 (4 GB) and there’s a stale replication slot with active = false.

How to Identify

Conditions:

df -h shows pg_wal filesystem at 100%
All write backends stuck in wait_event = 'WALWrite'
pg_replication_slots has inactive slot (active = false) with low restart_lsn
wal_keep_size is too large for available disk
WAL archiving is failing (pg_stat_archiver.failed_count increasing)

Analysis Steps

-- Check which backends are stuck
SELECT pid, usename, state, wait_event_type, wait_event, query_start, query
FROM pg_stat_activity
WHERE wait_event = 'WALWrite'
ORDER BY query_start;

-- Check replication slots — inactive slots hold WAL indefinitely
SELECT slot_name, active, restart_lsn, wal_status,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS held_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;
-- Large held_wal + active=false = this slot is causing WAL accumulation

-- Check wal_keep_size
SHOW wal_keep_size;
-- If very large: PostgreSQL keeps all this WAL regardless of replication slots

-- Check WAL archiver
SELECT failed_count, last_failed_wal, last_failed_time FROM pg_stat_archiver;
-- If failing: WAL is accumulating because archiving can't clear it

-- Check current WAL size
SELECT pg_size_pretty(sum(size)) AS pg_wal_size
FROM pg_ls_waldir();

Pitfalls

Dropping an inactive replication slot immediately releases held WAL — but verify that no process depends on it first.
Reducing wal_keep_size alone does not reclaim space immediately — space is reclaimed at the next checkpoint.
Do NOT delete files from pg_wal/ manually — PostgreSQL manages this directory exclusively. Manual deletion causes data corruption.
pg_wal on the same filesystem as pg_data is a common architectural mistake — WAL growth should not starve the data directory.
Emergency fix: max_slot_wal_keep_size limits how much WAL a slot can hold before it’s marked lost and invalidated.

Resolution Approach

Immediately drop the inactive replication slot causing WAL bloat. Then address the root cause: reduce wal_keep_size, fix WAL archiving, and set max_slot_wal_keep_size to prevent recurrence. Long term: put pg_wal on a separate filesystem.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

Mitigation Actions

All 36 Learning Pathway lessons
170+ cookbook recipes
Source-grounded diagnostics & fixes

Unlock everything — $24.99/month $199/year — save 35%

Secure checkout Cancel anytime Source-grounded

Already a member? Log in · New here? Create a free account

WAL Disk Full

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page