Batched Backfills: Updating Millions of Rows Safely

The one thing to understand first

Running UPDATE huge_table SET col = ... in a single statement looks simple but is operationally hostile:

It holds row locks on every updated row until commit — blocking concurrent writers for the whole duration.
It generates one enormous transaction’s worth of WAL, which can stall replication and fill disk.
It creates millions of dead tuples at once, causing a bloat spike that vacuum must chase.
If it fails near the end, the entire change rolls back — hours of work lost.
It holds back the <a class="sev1-termlink" href="https://thesev1database.com/glossary/xmin-horizon-oldestxmin/" title="xmin horizon (OldestXmin)">vacuum horizon for its entire (long) lifetime.

A backfill should be many small committed batches, not one giant transaction — each batch releases its locks, bounds its WAL, and lets vacuum and replicas catch up. The fix is to do the same work as many short, idempotent, resumable chunks.

Keyset batching (the right way)

Drive batches by the primary key so each chunk is an indexed range — never OFFSET, which re-scans more rows every iteration: