Cookbook recipe

Investigating SLRU cache misses with pg_stat_slru

Applies to PostgreSQL 13–17 Last reviewed Nov 2025 Grounded in source
Estimated investigation4 min

Scenario

PostgreSQL uses Simple LRU (SLRU) caches for internal state like commit log, subtransaction status, and multixact data. pg_stat_slru (PostgreSQL 13+) exposes hit rates that indicate whether these caches are under-sized. Diagnose it -- Available since PostgreSQL…

Investigation Path

PostgreSQL uses Simple LRU (SLRU) caches for internal state like commit log, subtransaction status, and multixact data. pg_stat_slru (PostgreSQL 13+) exposes hit rates that indicate whether these caches are under-sized.

Diagnose it

-- Available since PostgreSQL 13:
SELECT name,
       blks_zeroed,
       blks_hit,
       blks_read,
       blks_written,
       blks_exists,
       round(
           100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0),
       1) AS hit_pct,
       flushes,
       truncates,
       stats_reset
FROM pg_stat_slru
ORDER BY blks_read DESC;

SLRU names correspond to internal subsystems: Async (LISTEN/NOTIFY state),
CommitTs (commit timestamp data), MultiXactMember/MultiXactOffset
(multixact tracking), Notify, Serial (serializable transaction tracking),
Subtrans (subtransaction status), Xact (transaction commit/abort status).

Why it happens

SLRU buffers are sized by fixed internal constants (not GUCs in most cases) and are usually
small relative to their access patterns. A low hit_pct on Subtrans
can indicate heavy use of savepoints or PL/pgSQL exception blocks (each creates a subtransaction).
Low MultiXact hit rates correspond to heavy row-level locking patterns. High
flushes can contribute to I/O pressure.

How to fix it

For Subtrans misses: reduce the use of savepoints or PL/pgSQL exception blocks
that create subtransactions. For MultiXact misses: reduce concurrent row-level
locking (e.g., many concurrent SELECT FOR SHARE on the same rows). Both can be
partially addressed by a PostgreSQL build-time or GUC setting in newer releases — check the
documentation for your specific version.

Prevent it next time

Monitor SLRU stats regularly and correlate LWLock wait events related to SLRU
with these counters. A correlation between high blks_read in a specific SLRU
and wait_event values like SLRURead confirms a cache miss bottleneck.
Reset stats with pg_stat_reset_shared('slru') to establish a fresh baseline.

Related & next steps

Career Impact

This scenario builds production judgment and operational confidence under pressure.

Open Career Dashboard →

Keep going

Related & next steps

Was this helpful?

← All cookbook recipes