Cookbook recipe

pg_basebackup Failure

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source
Estimated investigation4 min

Scenario

Scenario A nightly pg_basebackup job started failing with FATAL: could not connect to server or ERROR: replication slot "backup_slot" does not exist. On other nights it fails with ERROR: requested WAL segment has already been recycled.…

Investigation Path

Scenario

A nightly pg_basebackup job started failing with FATAL: could not connect to server or ERROR: replication slot "backup_slot" does not exist. On other nights it fails with ERROR: requested WAL segment has already been recycled. The team has no WAL archiving configured — they rely solely on pg_basebackup for backups.

How to Identify

Conditions:

  • pg_basebackup exits non-zero with connection or replication errors
  • No WAL archiving means WAL needed during long backup may be recycled
  • max_wal_senders limit hit by concurrent standbys + backup connections
  • Backup replication slot referenced in backup script was dropped manually
  • wal_keep_size too small for long-running backups

Analysis Steps

-- Check max_wal_senders vs current usage
SHOW max_wal_senders;
SELECT count(*) FROM pg_stat_activity WHERE backend_type = 'walsender';
SELECT count(*) FROM pg_stat_replication;

-- Check if backup slot exists
SELECT slot_name, active, restart_lsn, wal_status
FROM pg_replication_slots
WHERE slot_name = 'backup_slot';
-- wal_status = 'lost' → slot WAL has been recycled (slot is stale)

-- Check wal_keep_size
SHOW wal_keep_size;
-- '0' = no WAL kept beyond what's needed for standbys (risky for long backups)

-- Check WAL archiving status
SHOW archive_mode;
SHOW archive_command;
-- If archive_mode=off and wal_keep_size=0: long backups can fail
-- if WAL from backup start is recycled before backup completes

-- Check pg_stat_archiver for archiving health
SELECT archived_count, last_archived_wal, failed_count, last_failed_wal, last_failed_time
FROM pg_stat_archiver;

Pitfalls

  • pg_basebackup streams WAL during backup. If the WAL from the start of the backup is recycled before the backup finishes, the backup is incomplete.
  • Using a replication slot with pg_basebackup -S slot_name prevents WAL recycling but causes WAL bloat if backups fail and the slot stays active.
  • max_wal_senders must account for standbys + pg_basebackup connections + pg_replication_slots usage.
  • Without WAL archiving, point-in-time recovery (PITR) is impossible — only crash recovery to the backup point is available.
  • Never take pg_basebackup on a replica if the replica has a recovery_min_apply_delay set — the backup may include inconsistent state.

Resolution Approach

Set wal_keep_size large enough to survive the longest expected backup window. Use a dedicated replication slot for backups (or WAL archiving) to guarantee WAL availability. Increase max_wal_senders to accommodate concurrent backups + standbys.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Mitigation Actions
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes