Scenario
A connection pool bug causes one connection to open a transaction and then become idle — the application never commits or rolls back. After 6 hours, the DBA notices table bloat accumulating across every table in the database, not just tables the idle transaction touches. Autovacuum runs constantly but cannot remove any dead tuples. The problem is that the idle-in-transaction connection holds the oldest active snapshot in the system; PostgreSQL cannot reclaim any row version older than that snapshot, even in completely unrelated tables.
How to Identify
Conditions:
pg_stat_activity shows a session with state = 'idle in transaction' and a xact_start many hours or days in the past
pg_stat_user_tables.n_dead_tup is rising across many tables, not just one
pg_stat_user_tables.last_autovacuum timestamps are recent but n_dead_tup is not decreasing
pg_stat_activity.backend_xmin for the idle session is very old (lagging far behind the current transaction ID)
- Table and index bloat visible in
pg_relation_size() growing over time
Analysis Steps
-- 1. Find the oldest active transaction — the xmin horizon holder
SELECT
pid,
usename,
application_name,
client_addr,
state,
backend_xmin,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
left(query, 80) AS last_query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start ASC NULLS LAST
LIMIT 5;
-- 2. Check age of the oldest xmin horizon globally
SELECT
min(age(backend_xmin)) AS oldest_xmin_age,
max(now() - xact_start) AS longest_xact_duration
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL;
-- 3. Observe dead tuple accumulation (bloat indicator)
SELECT
schemaname,
relname AS table_name,
n_dead_tup,
n_live_tup,
round(n_dead_tup::numeric /
NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)
AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC
LIMIT 20;
-- 4. Check autovacuum activity — running but unable to clean
SELECT
schemaname,
relname,
last_autovacuum,
last_autovacuum_count,
n_dead_tup
FROM pg_stat_user_tables
WHERE last_autovacuum IS NOT NULL
ORDER BY last_autovacuum DESC
LIMIT 10;
-- 5. Verify current transaction ID horizon
SELECT age(datfrozenxid) AS db_xid_age,
datfrozenxid
FROM pg_database
WHERE datname = current_database();
Pitfalls
- Long-running transactions prevent vacuum from removing old row versions needed for any active snapshot — this affects ALL tables, not just tables the transaction touches. The xmin horizon is global across the entire database instance.
- Idle-in-transaction sessions are the most dangerous class — they hold the xmin horizon frozen without doing any work, and no query-level timeout fires because there is no active query.
- Autovacuum will keep running and mark tables as vacuumed but will report
n_dead_tup not decreasing — this is a common diagnostic trap that looks like a vacuum configuration problem when the real cause is a long transaction.
- The xmin horizon is global — a single idle-in-transaction session on one database affects vacuum behavior for every table in that database.
- Transaction ID wraparound is the catastrophic end state if this problem persists long enough — the database will refuse all new write transactions to prevent data corruption.
Resolution Approach
Immediately terminate the idle-in-transaction session to release the xmin horizon. Autovacuum will then be able to clean up dead tuples in subsequent runs. Going forward, set idle_in_transaction_session_timeout to automatically close sessions that hold transactions open without activity. Monitor backend_xmin age in pg_stat_activity as a proactive alerting metric.