Investigating SLRU cache misses with pg_stat

Scenario

PostgreSQL uses Simple LRU (SLRU) caches for internal state like commit log, subtransaction status, and multixact data. pg_stat_slru (PostgreSQL 13+) exposes hit rates that indicate whether these caches are under-sized. Diagnose it -- Available since PostgreSQL…

Investigation Path

PostgreSQL uses Simple LRU (SLRU) caches for internal state like commit log, subtransaction status, and multixact data. pg_stat_slru (PostgreSQL 13+) exposes hit rates that indicate whether these caches are under-sized.

Diagnose it

-- Available since PostgreSQL 13:
SELECT name,
       blks_zeroed,
       blks_hit,
       blks_read,
       blks_written,
       blks_exists,
       round(
           100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0),
       1) AS hit_pct,
       flushes,
       truncates,
       stats_reset
FROM pg_stat_slru
ORDER BY blks_read DESC;

SLRU names correspond to internal subsystems: Async (LISTEN/NOTIFY state),
CommitTs (commit timestamp data), MultiXactMember/MultiXactOffset
(multixact tracking), Notify, Serial (serializable transaction tracking),
Subtrans (subtransaction status), Xact (transaction commit/abort status).

Why it happens

SLRU buffers are sized by fixed internal constants (not GUCs in most cases) and are usually
small relative to their access patterns. A low hit_pct on Subtrans
can indicate heavy use of savepoints or PL/pgSQL exception blocks (each creates a subtransaction).
Low MultiXact hit rates correspond to heavy row-level locking patterns. High
flushes can contribute to I/O pressure.

How to fix it

For Subtrans misses: reduce the use of savepoints or PL/pgSQL exception blocks
that create subtransactions. For MultiXact misses: reduce concurrent row-level
locking (e.g., many concurrent SELECT FOR SHARE on the same rows). Both can be
partially addressed by a PostgreSQL build-time or GUC setting in newer releases — check the
documentation for your specific version.

Prevent it next time

Monitor SLRU stats regularly and correlate LWLock wait events related to SLRU
with these counters. A correlation between high blks_read in a specific SLRU
and wait_event values like SLRURead confirms a cache miss bottleneck.
Reset stats with pg_stat_reset_shared('slru') to establish a fresh baseline.

Related & next steps

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Investigating SLRU cache misses with pg_stat_slru

What This Is

Why It Matters

Next Step

Scenario

Investigation Path

Diagnose it

Why it happens

How to fix it

Prevent it next time

Related & next steps

Career Impact

Related & next steps

More like this

Concepts on this page

Don't get paged twice for the same bug.

Scenario

Investigation Path

Diagnose it

Why it happens

How to fix it

Prevent it next time

Related & next steps

Related Knowledge

Related Interview Questions

Career Impact

Related & next steps

lan More like this

menu_book Concepts on this page

Don't get paged twice for the same bug.

More like this

Concepts on this page