pg_basebackup Failure

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

Estimated investigation4 min

Investigation Path

Scenario

A nightly pg_basebackup job started failing with FATAL: could not connect to server or ERROR: replication slot "backup_slot" does not exist. On other nights it fails with ERROR: requested WAL segment has already been recycled. The team has no WAL archiving configured — they rely solely on pg_basebackup for backups.

How to Identify

Conditions:

pg_basebackup exits non-zero with connection or replication errors
No WAL archiving means WAL needed during long backup may be recycled
max_wal_senders limit hit by concurrent standbys + backup connections
Backup replication slot referenced in backup script was dropped manually
wal_keep_size too small for long-running backups

Analysis Steps

-- Check max_wal_senders vs current usage
SHOW max_wal_senders;
SELECT count(*) FROM pg_stat_activity WHERE backend_type = 'walsender';
SELECT count(*) FROM pg_stat_replication;

-- Check if backup slot exists
SELECT slot_name, active, restart_lsn, wal_status
FROM pg_replication_slots
WHERE slot_name = 'backup_slot';
-- wal_status = 'lost' → slot WAL has been recycled (slot is stale)

-- Check wal_keep_size
SHOW wal_keep_size;
-- '0' = no WAL kept beyond what's needed for standbys (risky for long backups)

-- Check WAL archiving status
SHOW archive_mode;
SHOW archive_command;
-- If archive_mode=off and wal_keep_size=0: long backups can fail
-- if WAL from backup start is recycled before backup completes

-- Check pg_stat_archiver for archiving health
SELECT archived_count, last_archived_wal, failed_count, last_failed_wal, last_failed_time
FROM pg_stat_archiver;

Pitfalls

pg_basebackup streams WAL during backup. If the WAL from the start of the backup is recycled before the backup finishes, the backup is incomplete.
Using a replication slot with pg_basebackup -S slot_name prevents WAL recycling but causes WAL bloat if backups fail and the slot stays active.
max_wal_senders must account for standbys + pg_basebackup connections + pg_replication_slots usage.
Without WAL archiving, point-in-time recovery (PITR) is impossible — only crash recovery to the backup point is available.
Never take pg_basebackup on a replica if the replica has a recovery_min_apply_delay set — the backup may include inconsistent state.

Resolution Approach

Set wal_keep_size large enough to survive the longest expected backup window. Use a dedicated replication slot for backups (or WAL archiving) to guarantee WAL availability. Increase max_wal_senders to accommodate concurrent backups + standbys.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

Mitigation Actions

All 36 Learning Pathway lessons
170+ cookbook recipes
Source-grounded diagnostics & fixes

Unlock everything — $24.99/month $199/year — save 35%

Secure checkout Cancel anytime Source-grounded

Already a member? Log in · New here? Create a free account

pg_basebackup Failure

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Scenario

How to Identify

Analysis Steps

Pitfalls

Resolution Approach

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page