Azure HorizonDB: Microsoft Just Put Your Entire AI Stack Inside Postgres

At Build 2026, Microsoft announced Azure HorizonDB a new fully managed PostgreSQL database service currently in public preview. The marketing copy says “3× faster than self-managed Postgres” and “AI-ready.” Both are true in a narrow sense. But the more interesting story is architectural: HorizonDB is not just another hosted Postgres. It is a deliberate attempt to move your entire AI application stack embeddings, LLM calls, vector search, full-text search, RAG pipelines, graph traversal, and agent memory into the database engine itself.

This post is for engineers who want to understand what was actually built, not the product page summary. Every SQL example is sourced directly from the official Microsoft Learn documentation.

1. What It Actually Is Architecture

Azure HorizonDB is built on two foundational principles that Aurora PostgreSQL users will recognise immediately: disaggregated compute and storage, and a database-as-a-log design.

Your Application

standard Postgres wire protocol

Read-Write endpoint

Read-Only endpoint (load-balanced)

Compute Layer stateless · scales independently of storage

Primary Replica

reads & writes
8 GB RAM/vCore · NVMe SSD hot cache

Standby Replica(s)

reads + failover candidates
8 GB RAM/vCore · NVMe SSD hot cache

WAL only no dirty pages, no checkpoints from compute

Storage Layer zone-resilient by default · shared across all replicas

WAL Service

purpose-built for log writes
low-latency commit ack

Data Storage Fleet

sharded · multi-copy per shard
replays WAL into pages

WAL archival + page snapshots

Durability Layer

Azure Blob Storage (ZRS)

WAL archive · page data · backups = blob snapshots
zone-redundant · auto-scales to 128 TB

Processes eliminated from compute (offloaded to storage layer)

walsenderarchiverbgwritercheckpointerpg_basebackupfull page writesWAL recoveryreplica redo

Result: compute CPU and NVMe are freed for your application. Failover needs no log rewind. Read replicas provision in seconds shared storage, no data copy.

Scale ceiling

Up to 192 vCores per replica, up to 16 replicas (3,072 total vCores), auto-scaling storage to 128 TB, zone-resilient by default. Each replica: 8 GB RAM/vCore + local NVMe SSD cache.

2. AI Functions: SQL That Calls LLMs

The azure_ai extension introduces five SQL functions that call generative AI models directly from queries. The model lives in a model registry inside the database, addressed by alias no API key management per query, no external service call from the application layer.

Function	Purpose	Returns
`azure_openai.create_embeddings()`	Generate vector embeddings	`vector`
`azure_ai.generate()`	LLM text or structured output	`text` or `JSONB`
`azure_ai.extract()`	Named entity / feature extraction	`JSONB`
`azure_ai.is_true()`	Boolean LLM evaluation	`boolean`
`azure_ai.rank()`	Rerank documents by query relevance	`table(id, rank, score)`

Entity extraction over a reviews table

-- Extract product name and sentiment score from every review row
SELECT review_id,
       azure_ai.extract(
           review_text,
           ARRAY[
               'product: string',
               'sentiment: number - sentiment score; 1 (lowest) to 5 (highest)'
           ],
           'my-gpt'
       ) AS analysis
FROM product_reviews;

Output: {"product": "headphones", "sentiment": 3} a JSONB column you can index and filter on.

Structured JSON generation with enforced schema

SELECT review,
       azure_ai.generate(
           prompt      => 'Rewrite politely and count products mentioned: ' || review,
           json_schema => '{
             "name": "generate_response", "strict": true,
             "schema": {
               "type": "object",
               "properties": {
                 "comment":      { "type": "string"  },
                 "num_products": { "type": "integer" }
               },
               "required": ["comment", "num_products"],
               "additionalProperties": false
             }
           }',
           model => 'my-gpt'
       ) AS result
FROM reviews;

Two-stage retrieval: vector search → semantic reranking

WITH candidates AS (
    SELECT id, title, description
    FROM products
    ORDER BY embedding <=> azure_openai.create_embeddings(input => 'Alternatives to LEGO')::vector
    LIMIT 20
),
reranked AS (
    SELECT id AS row_id, rank
    FROM azure_ai.rank(
        'Alternatives to LEGO',
        ARRAY(SELECT description FROM candidates),
        ARRAY(SELECT id FROM candidates),
        'my-reranker'
    )
)
SELECT c.id, c.title
FROM candidates c
LEFT JOIN reranked r ON r.row_id = c.id
ORDER BY r.rank ASC;

Model registry per-model RBAC in SQL

-- Register a BYOM Foundry deployment
SELECT model_registry.model_add(
    'my-gpt',
    'https://your-resource.openai.azure.com/',
    'gpt-5-deployment',
    'gpt-5.4',
    NULL,
    'subscription-key',
    '<your-endpoint-key>'
);

Supported models

generate/extract/is_true/rank: all GPT and o-series (except gpt-5.4-pro) ·
create_embeddings: text-embedding-3-small/large, ada-002 ·
rank: also Cohere-rerank-v4.0-pro and Cohere-rerank-v4.0-fast

3. AI Model Management One Toggle, Three Models

AI Model Management (AIMM) is an optional feature currently a limited preview within the public preview, requiring a separate approval form. When enabled, it auto-provisions three managed models and wires the entire azure_ai extension with no endpoint or key configuration:

Default alias	Model	Purpose
`default-chat`	`gpt-5.4`	Generation, extraction, boolean evaluation
`default-embedding`	`text-embedding-3-small`	RAG embeddings, semantic search
`default-reranker`	`Cohere-rerank-v4.0-fast`	Semantic reranking of retrieval candidates

With AIMM enabled, the model parameter becomes optional functions fall back to their default:

-- AIMM uses default-chat automatically when no alias is given
SELECT azure_ai.generate('Is this review positive? ' || review_text) FROM reviews;

-- Uses default-embedding
SELECT azure_openai.create_embeddings(input => 'What is WAL in PostgreSQL?') AS vec;

-- Uses default-reranker
SELECT * FROM azure_ai.rank(
    'Best headphones for travel',
    ARRAY['Lightweight and foldable.', 'Bad battery life.', 'Great noise isolation.']
);

Important limited preview within preview

AIMM requires a separate approval form and is not available by default. Model usage is billed at Microsoft Foundry rates with no markup charges appear on your HorizonDB invoice.

4. AI Pipelines Durable RAG in the Database

Every engineer who has built a RAG pipeline has hit the same failure modes: the embedding API returns a 429 at row 50,000 and there is no shared checkpoint. A worker crashes after writing chunks but before committing the “processed” flag. The embedding model changes and there is no clean way to re-embed exactly the rows that need it.

AI Pipelines solve this by moving the logic into HorizonDB via pg_durable a durable-execution engine baked into the service. Each step becomes a durable node. A failure in ai.embed() does not rerun ai.chunk(). If the database restarts mid-backfill, execution resumes from the last checkpoint not row zero.

Step types

Step	What it does
`ai.chunk()`	Split a text column into overlapping chunks
`ai.embed()`	Generate vector embeddings for a column
`ai.extract()`	Extract structured fields from text using an LLM
`ai.generate()`	Generate text from a prompt template using an LLM
`ai.rank()`	Score or rank documents against a query

Required: sink table schema

The sink table must declare exactly these five columns: doc_id, chunk_index, chunk_text, embedding, metadata.

-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS pg_durable;
CREATE EXTENSION IF NOT EXISTS azure_ai;
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_diskann;

-- Sink table (five required columns)
CREATE TABLE documents_output (
    doc_id      INT, chunk_index INT, chunk_text TEXT,
    embedding   vector(1536), metadata JSONB
);
CREATE INDEX idx_docs_diskann ON documents_output
    USING diskann (embedding vector_cosine_ops) WITH (spherical_quantized = true);

-- Declare the pipeline
SELECT ai.create_pipeline(
    name    => 'docs_rag',
    source  => ai.table_source(table_name => 'documents', incremental_column => 'updated_at'),
    steps   => ARRAY[
        ai.chunk(input => 'content', chunk_size => 512, overlap => 64),
        ai.embed(model => 'default-embedding', input => 'chunk_text', dimensions => 1536)
    ],
    trigger => 'on_change',
    sink    => ai.table_sink('documents_output')
);

-- Run, monitor, backfill
SELECT ai.run('docs_rag');
SELECT * FROM ai.status('docs_rag');

-- After upgrading embedding model durable re-embed from checkpoint
TRUNCATE documents_output;
SELECT ai.backfill('docs_rag');

-- Pause cost during tuning
SELECT ai.pause('docs_rag');
SELECT ai.resume('docs_rag');

Limitations (Preview)

Pipelines run on the primary only. Sources must be HorizonDB tables. Pipeline state is not portable across major pg_durable versions drain or pause before upgrading.

5. BM25 Full-Text Search `pg_textsearch`

HorizonDB ships pg_textsearch — BM25 as a native Postgres index access method. BM25 is the same relevance algorithm used by Elasticsearch, Solr, and Azure AI Search. Postgres’s built-in ts_rank does not implement it.

	Built-in `tsvector + ts_rank`	`pg_textsearch` BM25
Ranking algorithm	Custom — no term saturation	BM25 — industry standard
Term saturation	No — keyword stuffing works	Yes — diminishing returns
Length normalization	No	Yes
IDF weighting	No	Yes — rare terms up-weighted
Top-k pattern	`WHERE @@` then sort	`ORDER BY … LIMIT n` (index-aware)
Index on plain TEXT	Requires `tsvector` column	Yes — index plain text directly

CREATE EXTENSION IF NOT EXISTS pg_textsearch;

CREATE INDEX idx_products_bm25
    ON products USING bm25 (description) WITH (text_config = 'english');

-- BM25 ranked retrieval lower score = better match
SELECT id, name,
       description <@> to_bm25query('wireless noise cancelling', 'idx_products_bm25') AS score
FROM products
ORDER BY score LIMIT 10;

-- Combined with a structured WHERE filter
SELECT id, name, category,
       description <@> to_bm25query('noise cancelling', 'idx_products_bm25') AS score
FROM products
WHERE category = 'audio'
ORDER BY score LIMIT 10;

BM25 Limitations (Preview)

No native phrase queries. No built-in fuzzy/typo matching (use pg_trgm). Inside PL/pgSQL use explicit to_bm25query(query, index_name).

6. Hybrid Search BM25 + Vector via Reciprocal Rank Fusion

HorizonDB runs both BM25 and vector search in the same query engine, combined with Reciprocal Rank Fusion (RRF):

WITH bm25_results AS (
    SELECT id,
           ROW_NUMBER() OVER (ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')) AS bm25_rank
    FROM documents
    ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')
    LIMIT 20
),
vector_results AS (
    SELECT id,
           ROW_NUMBER() OVER (ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector) AS vec_rank
    FROM documents
    ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector
    LIMIT 20
)
SELECT COALESCE(b.id, v.id) AS id,
       (1.0 / (60 + COALESCE(b.bm25_rank, 999))) +
       (1.0 / (60 + COALESCE(v.vec_rank, 999))) AS rrf_score
FROM bm25_results b FULL OUTER JOIN vector_results v ON b.id = v.id
ORDER BY rrf_score DESC LIMIT 10;

Three vector index types available: IVFFlat, HNSW (via pgvector), and DiskANN — Microsoft Research’s graph-based ANN index with spherical quantisation. For production-scale RAG, DiskANN is the recommended choice.

7. Graph Database Apache AGE + openCypher

Apache AGE is included as a first-class extension nodes, edges, and the openCypher query language directly inside Postgres. For AI workloads this enables Graph-RAG: retrieval augmented by structured knowledge graphs, not just vector proximity.

CREATE EXTENSION IF NOT EXISTS AGE CASCADE;
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('product_graph');

-- Query relationships with Cypher
SELECT * FROM cypher('product_graph', $
    MATCH (a:Product)-[r:BOUGHT_TOGETHER]->(b:Product)
    WHERE a.category = 'audio'
    SET a.disp_label = a.title
    SET b.disp_label = b.title
    RETURN a, r, b
$) AS (a agtype, r agtype, b agtype);

-- Multi-hop traversal
SELECT * FROM cypher('product_graph', $
    MATCH (a:Product)-[r1:BOUGHT_TOGETHER]->(mid:Product)-[r2:BOUGHT_TOGETHER]->(b:Product)
    RETURN a, r1, mid, r2, b
$) AS (a agtype, r1 agtype, mid agtype, r2 agtype, b agtype);

VS Code graph visualisation

The PostgreSQL extension for VS Code automatically detects AGE query results and renders them as interactive node-edge graphs. Return full objects (not scalar properties) and set disp_label for readable node labels.

Use cases: knowledge graphs for Graph-RAG, fraud detection, recommendation engines, entity relationship mapping via azure_ai.extract() feeding into AGE.

8. AI Agents HorizonDB as the Backend

HorizonDB is explicitly positioned as the unified data layer for AI agents covering three pillars:

Agent requirement	HorizonDB provides
Memory conversation history, task state	LangGraph PostgreSQL checkpointer · Semantic Kernel PostgreSQL connector
Knowledge retrieval grounded answers from business data	pgvector, DiskANN, BM25, Apache AGE, `azure_ai` functions
Multimodal storage JSONB, geospatial, arrays, vectors	All native PostgreSQL types in one engine no separate stores

Protocol	Purpose	Governance
MCP Model Context Protocol	Agent-to-tool connectivity	Linux Foundation · OAuth 2.1 + managed identity
A2A Agent-to-Agent Protocol	Peer-to-peer agent delegation	Google (open standard)

9. Preview Reality Check What’s Missing

Feature	Status	Impact
Configurable backup retention	Fixed at 7 days	Compliance blocker for regulated workloads
Long-term retention (LTR)	Not available	Compliance blocker
Cross-region read replicas	Not available	No geographic DR
Customer-managed keys (CMK)	Not available	Service-managed keys only enterprise security blocker
PgBouncer / connection pooling	Not available	External pooler required for high-connection apps
VNet injection	Not available	Private Endpoints, no VNet integration
Configurable maintenance windows	Not available	System-managed upgrade windows
AI Model Management (AIMM)	Limited preview-within-preview	Separate approval form required
Available regions	5 only	Central US, West US 2, West US 3, Sweden Central, Australia East

Production Blockers

If your workload requires any of: 30-day backup retention, CMK encryption, cross-region DR, or VNet injection — HorizonDB cannot meet those requirements today. This is a 2027 production service at the current pace.

10. Verdict — Should You Try Azure HorizonDB?

HorizonDB is genuinely novel. The WAL-as-log architecture is not marketing. The AI integration is not a thin wrapper: AI pipelines backed by pg_durable, BM25 as a first-class index access method, Apache AGE with Cypher, and model-registry RBAC in SQL are real engineering decisions.

Try Azure HorizonDB if…

You are building a new AI-native application on Azure and want to avoid managing separate embedding pipeline services, vector search services, and graph databases.
You are prototyping a RAG or agentic system and want everything in one Postgres endpoint.
You want to evaluate the disaggregated architecture before GA the $200 Azure free trial is enough to run the AI pipelines and BM25 examples above on a 2vCore cluster.
You are Azure-first and want to benchmark the WAL offload architecture against a checkpoint-heavy workload on Flexible Server.

Do not run Azure HorizonDB in production today if…

You need configurable backup retention, LTR, or CMK encryption.
Your compliance posture requires cross-region DR.
Your application uses a high number of short-lived connections (no PgBouncer yet).
You need your data outside the five available regions.

The bigger picture

Microsoft is betting that the right place for AI workloads is inside the database, not in a separate service tier. If that bet lands, HorizonDB is the product. The WAL-as-log design removes the checkpoint I/O bottlenecks. The AI function layer removes the complexity of coordinating your database with an external LLM service. The durable pipelines solve the exact failure modes anyone who has built a batch embedding job at scale has encountered.

Whether it justifies a 3× compute premium over Flexible Server depends on how much of that AI stack you are currently building and operating yourself. If the answer is “a lot,” the math changes.

Resources

Public preview no waitlist for the core service.
AIMM requires a separate approval form. ·
Pricing page ·
Documentation

All SQL examples sourced from official Microsoft Learn docs, last retrieved June 8, 2026. Pricing valid as of the same date.

Azure HorizonDB: Microsoft Just Put Your Entire AI Stack Inside Postgres

1. What It Actually Is Architecture

2. AI Functions: SQL That Calls LLMs

Entity extraction over a reviews table

Structured JSON generation with enforced schema

Two-stage retrieval: vector search → semantic reranking

Model registry per-model RBAC in SQL

3. AI Model Management One Toggle, Three Models

4. AI Pipelines Durable RAG in the Database

Step types

5. BM25 Full-Text Search `pg_textsearch`

6. Hybrid Search BM25 + Vector via Reciprocal Rank Fusion

7. Graph Database Apache AGE + openCypher

8. AI Agents HorizonDB as the Backend

9. Preview Reality Check What’s Missing

10. Verdict — Should You Try Azure HorizonDB?

Try Azure HorizonDB if…

Do not run Azure HorizonDB in production today if…

The bigger picture

Leave a Reply Cancel reply

Don't get paged twice for the same bug.

1. What It Actually Is Architecture

2. AI Functions: SQL That Calls LLMs

Entity extraction over a reviews table

Structured JSON generation with enforced schema

Two-stage retrieval: vector search → semantic reranking

Model registry per-model RBAC in SQL

3. AI Model Management One Toggle, Three Models

4. AI Pipelines Durable RAG in the Database

Step types

5. BM25 Full-Text Search pg_textsearch

6. Hybrid Search BM25 + Vector via Reciprocal Rank Fusion

7. Graph Database Apache AGE + openCypher

8. AI Agents HorizonDB as the Backend

9. Preview Reality Check What’s Missing

10. Verdict — Should You Try Azure HorizonDB?

Try Azure HorizonDB if…

Do not run Azure HorizonDB in production today if…

The bigger picture

Leave a Reply Cancel reply

Don't get paged twice for the same bug.

5. BM25 Full-Text Search `pg_textsearch`