Autovacuum Internals: How Dead Tuples Are Reclaimed

The one thing to understand first

Because MVCC leaves dead tuples behind (see the MVCC article), something must reclaim that space and, just as importantly, advance the cluster’s frozen-XID horizon to prevent transaction-ID wraparound. Autovacuum is the background subsystem that does both. Its code spans src/backend/postmaster/autovacuum.c (scheduling) and src/backend/access/heap/vacuumlazy.c (the actual work).

Autovacuum is not optional cleanup — it is the process that keeps your database able to accept writes. It does two jobs that look unrelated but share one engine: reclaiming dead-tuple space, and freezing old rows before the 32-bit XID counter wraps. Neglect either and you get runaway bloat or a cluster that refuses writes.

Launcher and workers

Autovacuum has two process types:

The launcher wakes every autovacuum_naptime seconds and decides which databases need attention, prioritising those closest to wraparound.
A worker is forked per database (up to autovacuum_max_workers) to process the tables that exceed their thresholds.

The threshold formula

A table becomes eligible for autovacuum when its estimated dead tuples exceed:

threshold = autovacuum_vacuum_threshold
          + autovacuum_vacuum_scale_factor * reltuples

With defaults (threshold = 50, scale_factor = 0.2), a 1,000,000-row table is vacuumed after ~200,050 dead tuples. The dead-tuple estimate comes from the cumulative statistics system, updated by backends as they modify rows. The same shape of formula with analyze_* parameters governs autoanalyze.

The classic mistake on large tables: the 20% scale factor means a billion-row table waits for 200 million dead tuples before vacuuming — far too late. The fix is per-table tuning:

ALTER TABLE big_table SET (
    autovacuum_vacuum_scale_factor = 0.02,
    autovacuum_vacuum_threshold    = 10000);

What a vacuum pass actually does

lazy_scan_heap() in vacuumlazy.c runs in phases:

Scan heap. Walk pages (skipping all-visible pages via the visibility map), collecting the line pointers of dead tuples whose xmax is older than the vacuum’s cutoff XID.
Vacuum indexes. For each index, remove entries pointing at the dead tuples. This is why many indexes make vacuum slower.
Vacuum heap. Turn the dead line pointers into reusable space and defragment the page.
Update the free space map and visibility map.

Crucially, ordinary (lazy) vacuum does not return space to the operating system — it makes space reusable within the table. Only VACUUM FULL (which rewrites the table and takes an exclusive lock) shrinks the file.

The visibility map

Each table has a visibility map (visibilitymap.c): two bits per heap page. The all-visible bit lets index-only scans skip heap fetches and lets the next vacuum skip the page entirely. The all-frozen bit lets aggressive freeze passes skip pages. Keeping these bits set is a major reason vacuum gets cheaper over time on append-mostly tables.

Freezing and wraparound

XIDs are 32-bit and wrap around after ~4 billion. To stay safe, vacuum freezes old tuples — marking them as unconditionally visible — before their age reaches autovacuum_freeze_max_age. When any table crosses that age, an anti-wraparound autovacuum is forced even if the table is otherwise quiet, and it cannot be skipped. Ignoring the warnings leads PostgreSQL to eventually refuse new writes to protect data — a production-stopping event.

-- Tables closest to wraparound trouble
SELECT relname, age(relfrozenxid) AS xid_age
FROM   pg_class
WHERE  relkind = 'r'
ORDER  BY xid_age DESC LIMIT 10;

Cost-based throttling

To avoid I/O storms, vacuum accumulates a “cost” as it reads and dirties pages and sleeps when it exceeds autovacuum_vacuum_cost_limit. On modern SSD systems the defaults are often too conservative; raising the cost limit lets autovacuum keep up with high-churn tables.

Layer 3 — Watch it happen on your own database

-- Tables closest to wraparound trouble
SELECT relname, age(relfrozenxid) AS xid_age
FROM   pg_class
WHERE  relkind = 'r'
ORDER  BY xid_age DESC LIMIT 10;

-- Dead-tuple counts and last autovacuum time
SELECT relname, n_dead_tup, n_live_tup, last_autovacuum
FROM   pg_stat_user_tables
ORDER  BY n_dead_tup DESC LIMIT 10;

-- A vacuum in flight: which phase and how far along
SELECT * FROM pg_stat_progress_vacuum;

The first query is your wraparound early-warning system — anything whose xid_age approaches autovacuum_freeze_max_age (default 200M) is overdue. The second shows which tables are accumulating dead tuples faster than autovacuum clears them. The third lets you watch a live vacuum move through its scan, index, and cleanup phases.

Layer 4 — The levers this hands you

Per-table scale factors. The 20% default is wrong for large tables; set autovacuum_vacuum_scale_factor low (e.g. 0.02) with a fixed autovacuum_vacuum_threshold on big, churny tables.
Raise the cost limit on SSDs. Increase autovacuum_vacuum_cost_limit (or lower autovacuum_vacuum_cost_delay) so vacuum keeps pace with write volume.
More workers for many hot tables. autovacuum_max_workers bounds concurrency; raise it if many tables qualify at once.
Long transactions throttle vacuum globally by holding back the cutoff XID — fix the transaction, not vacuum.
More indexes = slower vacuum because of the per-index cleanup phase; drop unused indexes.

Layer 5 — What an Oracle DBA should expect vs what they get

Oracle has no direct equivalent because it never created the problem autovacuum solves:

No background “undo cleanup” to compare to. Oracle reclaims old row images from UNDO automatically and overwrites blocks in place, so there is no table-level vacuum at all. In PostgreSQL, vacuum is a first-class operational concern you must monitor and tune.
Transaction-ID wraparound is uniquely Postgres. Oracle’s SCN is 64-bit and never wraps in practice; PostgreSQL’s 32-bit XID does, which is why freezing exists and why an ignored wraparound warning can stop writes entirely. There is no Oracle drill for this.
“My table doesn’t shrink after a big delete.” Correct — lazy vacuum only makes space reusable inside the table, like Oracle’s high-water-mark behaviour. Reclaiming OS space needs VACUUM FULL (an exclusive-lock rewrite) or pg_repack, analogous to an Oracle segment shrink/move.
The visibility map is the index-only-scan enabler. Oracle index-only access just needs the index; in Postgres an index-only scan also checks the all-visible bit, so keeping vacuum current directly affects whether index-only scans are allowed.

Key takeaway

Autovacuum schedules per-database workers when a table’s dead-tuple estimate crosses threshold + scale_factor × reltuples, then vacuumlazy.c scans the heap (skipping all-visible pages), cleans indexes, defragments pages, and freezes old XIDs. It reclaims space for reuse and defends against wraparound — it does not shrink files. Tune scale factors and cost limits per table, hunt long transactions, and watch age(relfrozenxid): a healthy autovacuum is the difference between a database that quietly keeps up and one that seizes.