Cookbook recipe

WAL Archive Failure

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation4 min

Scenario

Scenario An alert fires at 3am: disk usage on the WAL partition is at 85% and rising. The DBA investigates and finds pg_stat_archiver.failed_count has been climbing for 6 hours. The archive destination ran out of space,…

Investigation Path

Scenario

An alert fires at 3am: disk usage on the WAL partition is at 85% and rising. The DBA investigates and finds pg_stat_archiver.failed_count has been climbing for 6 hours. The archive destination ran out of space, causing every WAL segment to fail archiving. PostgreSQL retains all un-archived WAL in pg_wal/, which is now filling the disk.

How to Identify

Conditions:

  • pg_stat_archiver.failed_count increasing over time
  • pg_wal/ directory filling with .ready files in archive_status/
  • archive_command returning non-zero exit code
  • PostgreSQL logs contain archive_command failed messages
  • last_failed_wal and last_failed_time populated in pg_stat_archiver

Analysis Steps

-- 1. Check archiver state
SELECT
    archived_count,
    last_archived_wal,
    last_archived_time,
    failed_count,
    last_failed_wal,
    last_failed_time,
    now() - last_archived_time AS time_since_last_success
FROM pg_stat_archiver;

-- 2. Check current archive settings
SHOW archive_command;
SHOW archive_mode;
SHOW wal_level;

-- 3. Count WAL segments waiting to be archived (from OS)
-- ls $PGDATA/pg_wal/archive_status/*.ready | wc -l

-- 4. Check pg_wal directory size
SELECT pg_size_pretty(sum(size)) AS wal_dir_size
FROM   pg_ls_waldir();

Pitfalls

  • Do not delete .ready files manually to “hide” the problem — PostgreSQL uses these to track what needs archiving. Deleting them means those WAL segments are never archived, breaking PITR capability silently.
  • The archiver processes WAL segments sequentially — it will not skip a failed segment to archive later ones. One stuck segment blocks all subsequent archiving.
  • After fixing the archive destination, delete the .err files (not .ready files) and the archiver will retry automatically.
  • Restarting PostgreSQL is NOT required to resume archiving after fixing archive_command.

Resolution Approach

  1. Fix the root cause (archive destination full, permission issue, network problem)
  1. Test archive_command manually from the OS shell
  1. Remove .err status files so PostgreSQL retries the failed segments
  1. Monitor archived_count to confirm archiving resumes

Mitigation Actions

-- 1. Identify the failing WAL and root cause
SELECT last_failed_wal, last_failed_time, failed_count FROM pg_stat_archiver;

-- 2. Test archive_command manually from OS shell:
-- cp $PGDATA/pg_wal/<last_failed_wal> /archive/destination/
-- If this fails → reveals the actual error (disk full, permission, network)

-- 3. After fixing root cause, clear .err files to trigger retry:
-- find $PGDATA/pg_wal/archive_status/ -name "*.err" -delete
-- PostgreSQL will retry archiving automatically within archive_timeout

-- 4. Monitor recovery
SELECT archived_count, failed_count, last_archived_time
FROM   pg_stat_archiver;
-- archived_count should start increasing, failed_count should stop growing

-- 5. Alert configuration (add to monitoring):
-- Alert when: failed_count increases between checks
-- Alert when: now() - last_archived_time > 10 minutes

-- 6. Preventive: set archive destination monitoring
-- Alert archive destination disk at 70% usage threshold

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes