Azure HorizonDB: Microsoft Just Put Your Entire AI Stack Inside Postgres

A DBA deep dive into Azure HorizonDB: WAL-as-log architecture, AI SQL functions, BM25 full-text search, durable RAG pipelines with pg_durable, Apache AGE graph database, and a real cost breakdown vs Flexible Server.


At Build 2026, Microsoft announced Azure HorizonDB a new fully managed PostgreSQL database service currently in public preview. The marketing copy says “3× faster than self-managed Postgres” and “AI-ready.” Both are true in a narrow sense. But the more interesting story is architectural: HorizonDB is not just another hosted Postgres. It is a deliberate attempt to move your entire AI application stack embeddings, LLM calls, vector search, full-text search, RAG pipelines, graph traversal, and agent memory into the database engine itself.

This post is for engineers who want to understand what was actually built, not the product page summary. Every SQL example is sourced directly from the official Microsoft Learn documentation.

1. What It Actually Is Architecture

Azure HorizonDB is built on two foundational principles that Aurora PostgreSQL users will recognise immediately: disaggregated compute and storage, and a database-as-a-log design.

Your Application
standard Postgres wire protocol
Read-Write endpoint
 
 
Read-Only endpoint (load-balanced)
 
 
Primary Replica
reads & writes
8 GB RAM/vCore · NVMe SSD hot cache
Standby Replica(s)
reads + failover candidates
8 GB RAM/vCore · NVMe SSD hot cache
WAL only no dirty pages, no checkpoints from compute
 
 
WAL Service
purpose-built for log writes
low-latency commit ack
Data Storage Fleet
sharded · multi-copy per shard
replays WAL into pages
WAL archival + page snapshots
 
 
Azure Blob Storage (ZRS)
WAL archive · page data · backups = blob snapshots
zone-redundant · auto-scales to 128 TB
Processes eliminated from compute (offloaded to storage layer)
walsenderarchiverbgwritercheckpointerpg_basebackupfull page writesWAL recoveryreplica redo
Result: compute CPU and NVMe are freed for your application. Failover needs no log rewind. Read replicas provision in seconds shared storage, no data copy.
Scale ceiling

Up to 192 vCores per replica, up to 16 replicas (3,072 total vCores), auto-scaling storage to 128 TB, zone-resilient by default. Each replica: 8 GB RAM/vCore + local NVMe SSD cache.


2. AI Functions: SQL That Calls LLMs

The azure_ai extension introduces five SQL functions that call generative AI models directly from queries. The model lives in a model registry inside the database, addressed by alias no API key management per query, no external service call from the application layer.

Function Purpose Returns
azure_openai.create_embeddings() Generate vector embeddings vector
azure_ai.generate() LLM text or structured output text or JSONB
azure_ai.extract() Named entity / feature extraction JSONB
azure_ai.is_true() Boolean LLM evaluation boolean
azure_ai.rank() Rerank documents by query relevance table(id, rank, score)

Entity extraction over a reviews table

-- Extract product name and sentiment score from every review row
SELECT review_id,
       azure_ai.extract(
           review_text,
           ARRAY[
               'product: string',
               'sentiment: number - sentiment score; 1 (lowest) to 5 (highest)'
           ],
           'my-gpt'
       ) AS analysis
FROM product_reviews;

Output: {"product": "headphones", "sentiment": 3} a JSONB column you can index and filter on.

Structured JSON generation with enforced schema

SELECT review,
       azure_ai.generate(
           prompt      => 'Rewrite politely and count products mentioned: ' || review,
           json_schema => '{
             "name": "generate_response", "strict": true,
             "schema": {
               "type": "object",
               "properties": {
                 "comment":      { "type": "string"  },
                 "num_products": { "type": "integer" }
               },
               "required": ["comment", "num_products"],
               "additionalProperties": false
             }
           }',
           model => 'my-gpt'
       ) AS result
FROM reviews;

Two-stage retrieval: vector search → semantic reranking

WITH candidates AS (
    SELECT id, title, description
    FROM products
    ORDER BY embedding <=> azure_openai.create_embeddings(input => 'Alternatives to LEGO')::vector
    LIMIT 20
),
reranked AS (
    SELECT id AS row_id, rank
    FROM azure_ai.rank(
        'Alternatives to LEGO',
        ARRAY(SELECT description FROM candidates),
        ARRAY(SELECT id FROM candidates),
        'my-reranker'
    )
)
SELECT c.id, c.title
FROM candidates c
LEFT JOIN reranked r ON r.row_id = c.id
ORDER BY r.rank ASC;

Model registry per-model RBAC in SQL

-- Register a BYOM Foundry deployment
SELECT model_registry.model_add(
    'my-gpt',
    'https://your-resource.openai.azure.com/',
    'gpt-5-deployment',
    'gpt-5.4',
    NULL,
    'subscription-key',
    '<your-endpoint-key>'
);
Supported models

generate/extract/is_true/rank: all GPT and o-series (except gpt-5.4-pro) ·
create_embeddings: text-embedding-3-small/large, ada-002 ·
rank: also Cohere-rerank-v4.0-pro and Cohere-rerank-v4.0-fast


3. AI Model Management One Toggle, Three Models

AI Model Management (AIMM) is an optional feature currently a limited preview within the public preview, requiring a separate approval form. When enabled, it auto-provisions three managed models and wires the entire azure_ai extension with no endpoint or key configuration:

Default alias Model Purpose
default-chat gpt-5.4 Generation, extraction, boolean evaluation
default-embedding text-embedding-3-small RAG embeddings, semantic search
default-reranker Cohere-rerank-v4.0-fast Semantic reranking of retrieval candidates

With AIMM enabled, the model parameter becomes optional functions fall back to their default:

-- AIMM uses default-chat automatically when no alias is given
SELECT azure_ai.generate('Is this review positive? ' || review_text) FROM reviews;

-- Uses default-embedding
SELECT azure_openai.create_embeddings(input => 'What is WAL in PostgreSQL?') AS vec;

-- Uses default-reranker
SELECT * FROM azure_ai.rank(
    'Best headphones for travel',
    ARRAY['Lightweight and foldable.', 'Bad battery life.', 'Great noise isolation.']
);
Important limited preview within preview

AIMM requires a separate approval form and is not available by default. Model usage is billed at Microsoft Foundry rates with no markup charges appear on your HorizonDB invoice.


4. AI Pipelines Durable RAG in the Database

Every engineer who has built a RAG pipeline has hit the same failure modes: the embedding API returns a 429 at row 50,000 and there is no shared checkpoint. A worker crashes after writing chunks but before committing the “processed” flag. The embedding model changes and there is no clean way to re-embed exactly the rows that need it.

AI Pipelines solve this by moving the logic into HorizonDB via pg_durable a durable-execution engine baked into the service. Each step becomes a durable node. A failure in ai.embed() does not rerun ai.chunk(). If the database restarts mid-backfill, execution resumes from the last checkpoint not row zero.

Step types

Step What it does
ai.chunk() Split a text column into overlapping chunks
ai.embed() Generate vector embeddings for a column
ai.extract() Extract structured fields from text using an LLM
ai.generate() Generate text from a prompt template using an LLM
ai.rank() Score or rank documents against a query
Required: sink table schema

The sink table must declare exactly these five columns: doc_id, chunk_index, chunk_text, embedding, metadata.

-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS pg_durable;
CREATE EXTENSION IF NOT EXISTS azure_ai;
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_diskann;

-- Sink table (five required columns)
CREATE TABLE documents_output (
    doc_id      INT, chunk_index INT, chunk_text TEXT,
    embedding   vector(1536), metadata JSONB
);
CREATE INDEX idx_docs_diskann ON documents_output
    USING diskann (embedding vector_cosine_ops) WITH (spherical_quantized = true);

-- Declare the pipeline
SELECT ai.create_pipeline(
    name    => 'docs_rag',
    source  => ai.table_source(table_name => 'documents', incremental_column => 'updated_at'),
    steps   => ARRAY[
        ai.chunk(input => 'content', chunk_size => 512, overlap => 64),
        ai.embed(model => 'default-embedding', input => 'chunk_text', dimensions => 1536)
    ],
    trigger => 'on_change',
    sink    => ai.table_sink('documents_output')
);

-- Run, monitor, backfill
SELECT ai.run('docs_rag');
SELECT * FROM ai.status('docs_rag');

-- After upgrading embedding model durable re-embed from checkpoint
TRUNCATE documents_output;
SELECT ai.backfill('docs_rag');

-- Pause cost during tuning
SELECT ai.pause('docs_rag');
SELECT ai.resume('docs_rag');
Limitations (Preview)

Pipelines run on the primary only. Sources must be HorizonDB tables. Pipeline state is not portable across major pg_durable versions drain or pause before upgrading.


5. BM25 Full-Text Search pg_textsearch

HorizonDB ships pg_textsearch — BM25 as a native Postgres index access method. BM25 is the same relevance algorithm used by Elasticsearch, Solr, and Azure AI Search. Postgres’s built-in ts_rank does not implement it.

  Built-in tsvector + ts_rank pg_textsearch BM25
Ranking algorithm Custom — no term saturation BM25 — industry standard
Term saturation No — keyword stuffing works Yes — diminishing returns
Length normalization No Yes
IDF weighting No Yes — rare terms up-weighted
Top-k pattern WHERE @@ then sort ORDER BY … LIMIT n (index-aware)
Index on plain TEXT Requires tsvector column Yes — index plain text directly
CREATE EXTENSION IF NOT EXISTS pg_textsearch;

CREATE INDEX idx_products_bm25
    ON products USING bm25 (description) WITH (text_config = 'english');

-- BM25 ranked retrieval lower score = better match
SELECT id, name,
       description <@> to_bm25query('wireless noise cancelling', 'idx_products_bm25') AS score
FROM products
ORDER BY score LIMIT 10;

-- Combined with a structured WHERE filter
SELECT id, name, category,
       description <@> to_bm25query('noise cancelling', 'idx_products_bm25') AS score
FROM products
WHERE category = 'audio'
ORDER BY score LIMIT 10;
BM25 Limitations (Preview)

No native phrase queries. No built-in fuzzy/typo matching (use pg_trgm). Inside PL/pgSQL use explicit to_bm25query(query, index_name).


6. Hybrid Search BM25 + Vector via Reciprocal Rank Fusion

HorizonDB runs both BM25 and vector search in the same query engine, combined with Reciprocal Rank Fusion (RRF):

WITH bm25_results AS (
    SELECT id,
           ROW_NUMBER() OVER (ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')) AS bm25_rank
    FROM documents
    ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')
    LIMIT 20
),
vector_results AS (
    SELECT id,
           ROW_NUMBER() OVER (ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector) AS vec_rank
    FROM documents
    ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector
    LIMIT 20
)
SELECT COALESCE(b.id, v.id) AS id,
       (1.0 / (60 + COALESCE(b.bm25_rank, 999))) +
       (1.0 / (60 + COALESCE(v.vec_rank, 999))) AS rrf_score
FROM bm25_results b FULL OUTER JOIN vector_results v ON b.id = v.id
ORDER BY rrf_score DESC LIMIT 10;

Three vector index types available: IVFFlat, HNSW (via pgvector), and DiskANN — Microsoft Research’s graph-based ANN index with spherical quantisation. For production-scale RAG, DiskANN is the recommended choice.


7. Graph Database Apache AGE + openCypher

Apache AGE is included as a first-class extension nodes, edges, and the openCypher query language directly inside Postgres. For AI workloads this enables Graph-RAG: retrieval augmented by structured knowledge graphs, not just vector proximity.

CREATE EXTENSION IF NOT EXISTS AGE CASCADE;
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('product_graph');

-- Query relationships with Cypher
SELECT * FROM cypher('product_graph', $
    MATCH (a:Product)-[r:BOUGHT_TOGETHER]->(b:Product)
    WHERE a.category = 'audio'
    SET a.disp_label = a.title
    SET b.disp_label = b.title
    RETURN a, r, b
$) AS (a agtype, r agtype, b agtype);

-- Multi-hop traversal
SELECT * FROM cypher('product_graph', $
    MATCH (a:Product)-[r1:BOUGHT_TOGETHER]->(mid:Product)-[r2:BOUGHT_TOGETHER]->(b:Product)
    RETURN a, r1, mid, r2, b
$) AS (a agtype, r1 agtype, mid agtype, r2 agtype, b agtype);
VS Code graph visualisation

The PostgreSQL extension for VS Code automatically detects AGE query results and renders them as interactive node-edge graphs. Return full objects (not scalar properties) and set disp_label for readable node labels.

Use cases: knowledge graphs for Graph-RAG, fraud detection, recommendation engines, entity relationship mapping via azure_ai.extract() feeding into AGE.


8. AI Agents HorizonDB as the Backend

HorizonDB is explicitly positioned as the unified data layer for AI agents covering three pillars:

Agent requirement HorizonDB provides
Memory conversation history, task state LangGraph PostgreSQL checkpointer · Semantic Kernel PostgreSQL connector
Knowledge retrieval grounded answers from business data pgvector, DiskANN, BM25, Apache AGE, azure_ai functions
Multimodal storage JSONB, geospatial, arrays, vectors All native PostgreSQL types in one engine no separate stores
Protocol Purpose Governance
MCP Model Context Protocol Agent-to-tool connectivity Linux Foundation · OAuth 2.1 + managed identity
A2A Agent-to-Agent Protocol Peer-to-peer agent delegation Google (open standard)

9. Preview Reality Check What’s Missing

Feature Status Impact
Configurable backup retention Fixed at 7 days Compliance blocker for regulated workloads
Long-term retention (LTR) Not available Compliance blocker
Cross-region read replicas Not available No geographic DR
Customer-managed keys (CMK) Not available Service-managed keys only enterprise security blocker
PgBouncer / connection pooling Not available External pooler required for high-connection apps
VNet injection Not available Private Endpoints, no VNet integration
Configurable maintenance windows Not available System-managed upgrade windows
AI Model Management (AIMM) Limited preview-within-preview Separate approval form required
Available regions 5 only Central US, West US 2, West US 3, Sweden Central, Australia East
Production Blockers

If your workload requires any of: 30-day backup retention, CMK encryption, cross-region DR, or VNet injection — HorizonDB cannot meet those requirements today. This is a 2027 production service at the current pace.


10. Verdict — Should You Try Azure HorizonDB?

HorizonDB is genuinely novel. The WAL-as-log architecture is not marketing. The AI integration is not a thin wrapper: AI pipelines backed by pg_durable, BM25 as a first-class index access method, Apache AGE with Cypher, and model-registry RBAC in SQL are real engineering decisions.

Try Azure HorizonDB if…

  • You are building a new AI-native application on Azure and want to avoid managing separate embedding pipeline services, vector search services, and graph databases.
  • You are prototyping a RAG or agentic system and want everything in one Postgres endpoint.
  • You want to evaluate the disaggregated architecture before GA the $200 Azure free trial is enough to run the AI pipelines and BM25 examples above on a 2vCore cluster.
  • You are Azure-first and want to benchmark the WAL offload architecture against a checkpoint-heavy workload on Flexible Server.

Do not run Azure HorizonDB in production today if…

  • You need configurable backup retention, LTR, or CMK encryption.
  • Your compliance posture requires cross-region DR.
  • Your application uses a high number of short-lived connections (no PgBouncer yet).
  • You need your data outside the five available regions.

The bigger picture

Microsoft is betting that the right place for AI workloads is inside the database, not in a separate service tier. If that bet lands, HorizonDB is the product. The WAL-as-log design removes the checkpoint I/O bottlenecks. The AI function layer removes the complexity of coordinating your database with an external LLM service. The durable pipelines solve the exact failure modes anyone who has built a batch embedding job at scale has encountered.

Whether it justifies a 3× compute premium over Flexible Server depends on how much of that AI stack you are currently building and operating yourself. If the answer is “a lot,” the math changes.

Resources

Public preview no waitlist for the core service.
AIMM requires a separate approval form. ·
Pricing page ·
Documentation

All SQL examples sourced from official Microsoft Learn docs, last retrieved June 8, 2026. Pricing valid as of the same date.

Leave a Reply

Your email address will not be published. Required fields are marked *