The one thing to understand first
PostgreSQL storage is organised into fixed-size blocks, 8KB by default (BLCKSZ). A table (the “main fork”) is an array of these blocks in one or more 1GB segment files under the database directory. Every block — heap or index — shares the same general layout, defined in src/include/storage/bufpage.h.
A page is a tiny 8KB filesystem: a header at the top, a directory of line pointers growing down, and tuples growing up from the bottom. That line-pointer indirection is the quiet hero — it lets rows move within a page, makes HOT updates possible, and explains why a row wider than ~2KB has to be pushed out to TOAST.
Four regions of a page
A heap page has four regions, growing toward each other from both ends:
- Page header (
PageHeaderData, 24 bytes) — the LSN of the last change (pd_lsn), checksum, flags, and three offsets:pd_lower,pd_upper,pd_special. - Line pointer array (
ItemIdData, 4 bytes each) — grows downward from just after the header. Each entry points to a tuple and stores its offset, length, and a state (used / dead / redirect). - Free space — the gap between
pd_lower(end of line pointers) andpd_upper(start of tuples). - Tuples — stored from the end of the page growing upward.
The “special space” at the very end is empty for heap pages but used by index access methods (e.g. btree stores its opaque data there).
Why the indirection of line pointers?
A tuple is addressed by a TID (ctid)">TID (ctid)">tuple identifier): (block number, line pointer index), the ItemPointerData in itemptr.h. Indexes store TIDs. Because callers reference the line pointer rather than a raw offset, the tuple can be moved within the page during defragmentation without invalidating index entries — the line pointer is updated, the TID stays the same. This indirection is what makes intra-page compaction possible.
Line pointer states and HOT
An ItemId can be:
- LP_NORMAL — points to a live or recently-dead tuple.
- LP_DEAD — the tuple is dead; space reclaimable by vacuum (or page pruning).
- LP_REDIRECT — points to another line pointer, used by HOT chains so an index entry to the original TID still resolves to the current version.
- LP_UNUSED — free for reuse.
This is the storage-level reason HOT updates avoid index writes: the index keeps pointing at the root line pointer, which redirects to the newest heap-only tuple within the same page.
Layer 3 — Watch it happen on your own database
CREATE EXTENSION IF NOT EXISTS pageinspect;
-- Page header: see pd_lower / pd_upper / free space
SELECT lower, upper, special, pagesize
FROM page_header(get_raw_page('accounts', 0));
-- Line pointers and tuple lengths
SELECT lp, lp_off, lp_len, t_ctid
FROM heap_page_items(get_raw_page('accounts', 0));
Free space on the page is roughly upper - lower. Update a row and look again: a new lp appears and the old line pointer may become LP_REDIRECT (a HOT chain) or LP_DEAD. The cluster-wide Free Space Map (a separate _fsm fork) aggregates this so inserts can quickly find a page with room.
Layer 4 — The levers this hands you
The page budget is something you can tune. Leaving slack on each page lets future updates place the new tuple version on the same page, enabling HOT and reducing index churn. For update-heavy tables, lowering fillfactor trades some space for far less bloat:
ALTER TABLE hot_table SET (fillfactor = 80);
-- Newly written pages keep 20% free for in-page updates.
- Row width matters. A row that exceeds ~2KB (a quarter page) triggers TOAST; understanding the page budget explains why and lets you design narrower hot rows.
- Bloat is measurable. Comparing live tuple bytes to relation size (via
pgstattuple) quantifies wasted space so you know when a rewrite is justified. - Checksums live in the header. Initialise the cluster with
data_checksumsto detect silent page-level corruption.
Layer 5 — What an Oracle DBA should expect vs what they get
If you know Oracle blocks, the shapes rhyme but the names and defaults differ:
- 8KB fixed, not a tablespace choice. Oracle lets you pick 2K–32K block sizes per tablespace; PostgreSQL fixes
BLCKSZat compile time (8KB almost everywhere). You design around 8KB rather than choosing it. fillfactoris PCTFREE. Oracle’s PCTFREE/PCTUSED reserve in-block space for updates;fillfactoris the direct analogue, and it is the main knob for enabling HOT (Oracle has no HOT concept — it manages row migration/chaining instead).- Line pointers vs row directory. Oracle’s block row directory plus ROWID maps closely to the
ItemIdarray plus TID. The big difference: Postgres uses the indirection to relocate tuples during in-page pruning without invalidating index TIDs. - TOAST instead of chained/migrated rows. When a row will not fit, Oracle chains or migrates it across blocks; PostgreSQL compresses and/or stores oversized attributes out-of-line in a TOAST table. Same problem, very different mechanism.
Key takeaway
Every heap page is a 24-byte header, a downward-growing array of 4-byte line pointers, free space, and tuples filling upward from the end. Indexes store TIDs that reference line pointers, not raw offsets — which is what makes intra-page compaction, HOT, and pruning possible without rewriting indexes. Knowing the 8KB budget explains TOAST, fillfactor, and bloat measurement, turning “why is my table bigger than its data” from a mystery into arithmetic.