Lesson 1 of 12

ANALYZE Internals: How PostgreSQL Samples Your Data to Guess Plans

Applies to PostgreSQL 13–17 Last reviewed Jun 2026 Grounded in source

The one thing to understand first

ANALYZE never reads your whole table. It reads a small, bounded, random sample, builds a statistical portrait of every column, and stores it. From that moment on, the planner makes every decision about your queries from that portrait — never from the real data. If the portrait is wrong, every plan built on it is wrong. This is the single highest-leverage subsystem in PostgreSQL, and it fits in one file: src/backend/commands/analyze.c.

How big is the sample, and why exactly that big

The entry point is do_analyze_rel(). The number of rows it targets is computed in the per-type “typanalyze” routine — for ordinary scalar types, std_typanalyze() sets:

stats->minrows = 300 * attstattarget;

With the default default_statistics_target = 100, that is 30,000 rows — regardless of whether the table has one million rows or one billion. That is why ANALYZE cost is roughly constant instead of proportional to table size.

This is a Pro lesson

Get every Learning Pathway and cookbook recipe — grounded in PostgreSQL source code, with diagnostics, fixes, and prevention for each topic.

Continue this lesson to learn:

  • How the sample is actually drawn (two stages, not one)
  • What ANALYZE computes per column
  • Layer 3 — Watch it happen on your own database
  • Layer 4 — The levers this hands you
  • Layer 5 — What an Oracle DBA should expect vs what they get
  • Key takeaway
  • All 36 Learning Pathway lessons
  • 170+ cookbook recipes
  • Source-grounded diagnostics & fixes

Secure checkout Cancel anytime Source-grounded

Was this helpful?

← Back to 05 — The Engine Room: How PostgreSQL Actually Executes