Lesson 5 of 7

Batched Backfills: Updating Millions of Rows Safely

Applies to PostgreSQL 13–17 Last reviewed May 2026 Grounded in source

The one thing to understand first

Running UPDATE huge_table SET col = ... in a single statement looks simple but is operationally hostile:

  • It holds row locks on every updated row until commit — blocking concurrent writers for the whole duration.
  • It generates one enormous transaction’s worth of WAL, which can stall replication and fill disk.
  • It creates millions of dead tuples at once, causing a bloat spike that vacuum must chase.
  • If it fails near the end, the entire change rolls back — hours of work lost.
  • It holds back the <a class="sev1-termlink" href="https://thesev1database.com/glossary/xmin-horizon-oldestxmin/" title="xmin horizon (OldestXmin)">vacuum horizon for its entire (long) lifetime.

A backfill should be many small committed batches, not one giant transaction — each batch releases its locks, bounds its WAL, and lets vacuum and replicas catch up. The fix is to do the same work as many short, idempotent, resumable chunks.

Keyset batching (the right way)

Drive batches by the primary key so each chunk is an indexed range — never OFFSET, which re-scans more rows every iteration:

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • Make it idempotent and resumable
  • Layer 3 — Watch it happen on your own database
  • Layer 4 — The levers this hands you
  • Layer 5 — What an Oracle DBA should expect vs what they get
  • Key takeaway
  • How this article was written
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Was this helpful?

← Back to 03 — Operations: Zero-Downtime Deployments