At Build 2026, Microsoft announced Azure HorizonDB a new fully managed PostgreSQL database service currently in public preview. The marketing copy says “3× faster than self-managed Postgres” and “AI-ready.” Both are true in a narrow sense. But the more interesting story is architectural: HorizonDB is not just another hosted Postgres. It is a deliberate attempt to move your entire AI application stack embeddings, LLM calls, vector search, full-text search, RAG pipelines, graph traversal, and agent memory into the database engine itself.
This post is for engineers who want to understand what was actually built, not the product page summary. Every SQL example is sourced directly from the official Microsoft Learn documentation.
1. What It Actually Is Architecture
Azure HorizonDB is built on two foundational principles that Aurora PostgreSQL users will recognise immediately: disaggregated compute and storage, and a database-as-a-log design.
8 GB RAM/vCore · NVMe SSD hot cache
8 GB RAM/vCore · NVMe SSD hot cache
low-latency commit ack
replays WAL into pages
zone-redundant · auto-scales to 128 TB
Up to 192 vCores per replica, up to 16 replicas (3,072 total vCores), auto-scaling storage to 128 TB, zone-resilient by default. Each replica: 8 GB RAM/vCore + local NVMe SSD cache.
2. AI Functions: SQL That Calls LLMs
The azure_ai extension introduces five SQL functions that call generative AI models directly from queries. The model lives in a model registry inside the database, addressed by alias no API key management per query, no external service call from the application layer.
| Function | Purpose | Returns |
|---|---|---|
azure_openai.create_embeddings() |
Generate vector embeddings | vector |
azure_ai.generate() |
LLM text or structured output | text or JSONB |
azure_ai.extract() |
Named entity / feature extraction | JSONB |
azure_ai.is_true() |
Boolean LLM evaluation | boolean |
azure_ai.rank() |
Rerank documents by query relevance | table(id, rank, score) |
Entity extraction over a reviews table
-- Extract product name and sentiment score from every review row
SELECT review_id,
azure_ai.extract(
review_text,
ARRAY[
'product: string',
'sentiment: number - sentiment score; 1 (lowest) to 5 (highest)'
],
'my-gpt'
) AS analysis
FROM product_reviews;
Output: {"product": "headphones", "sentiment": 3} a JSONB column you can index and filter on.
Structured JSON generation with enforced schema
SELECT review,
azure_ai.generate(
prompt => 'Rewrite politely and count products mentioned: ' || review,
json_schema => '{
"name": "generate_response", "strict": true,
"schema": {
"type": "object",
"properties": {
"comment": { "type": "string" },
"num_products": { "type": "integer" }
},
"required": ["comment", "num_products"],
"additionalProperties": false
}
}',
model => 'my-gpt'
) AS result
FROM reviews;
Two-stage retrieval: vector search → semantic reranking
WITH candidates AS (
SELECT id, title, description
FROM products
ORDER BY embedding <=> azure_openai.create_embeddings(input => 'Alternatives to LEGO')::vector
LIMIT 20
),
reranked AS (
SELECT id AS row_id, rank
FROM azure_ai.rank(
'Alternatives to LEGO',
ARRAY(SELECT description FROM candidates),
ARRAY(SELECT id FROM candidates),
'my-reranker'
)
)
SELECT c.id, c.title
FROM candidates c
LEFT JOIN reranked r ON r.row_id = c.id
ORDER BY r.rank ASC;
Model registry per-model RBAC in SQL
-- Register a BYOM Foundry deployment
SELECT model_registry.model_add(
'my-gpt',
'https://your-resource.openai.azure.com/',
'gpt-5-deployment',
'gpt-5.4',
NULL,
'subscription-key',
'<your-endpoint-key>'
);
generate/extract/is_true/rank: all GPT and o-series (except gpt-5.4-pro) ·create_embeddings: text-embedding-3-small/large, ada-002 ·rank: also Cohere-rerank-v4.0-pro and Cohere-rerank-v4.0-fast
3. AI Model Management One Toggle, Three Models
AI Model Management (AIMM) is an optional feature currently a limited preview within the public preview, requiring a separate approval form. When enabled, it auto-provisions three managed models and wires the entire azure_ai extension with no endpoint or key configuration:
| Default alias | Model | Purpose |
|---|---|---|
default-chat |
gpt-5.4 |
Generation, extraction, boolean evaluation |
default-embedding |
text-embedding-3-small |
RAG embeddings, semantic search |
default-reranker |
Cohere-rerank-v4.0-fast |
Semantic reranking of retrieval candidates |
With AIMM enabled, the model parameter becomes optional functions fall back to their default:
-- AIMM uses default-chat automatically when no alias is given
SELECT azure_ai.generate('Is this review positive? ' || review_text) FROM reviews;
-- Uses default-embedding
SELECT azure_openai.create_embeddings(input => 'What is WAL in PostgreSQL?') AS vec;
-- Uses default-reranker
SELECT * FROM azure_ai.rank(
'Best headphones for travel',
ARRAY['Lightweight and foldable.', 'Bad battery life.', 'Great noise isolation.']
);
AIMM requires a separate approval form and is not available by default. Model usage is billed at Microsoft Foundry rates with no markup charges appear on your HorizonDB invoice.
4. AI Pipelines Durable RAG in the Database
Every engineer who has built a RAG pipeline has hit the same failure modes: the embedding API returns a 429 at row 50,000 and there is no shared checkpoint. A worker crashes after writing chunks but before committing the “processed” flag. The embedding model changes and there is no clean way to re-embed exactly the rows that need it.
AI Pipelines solve this by moving the logic into HorizonDB via pg_durable a durable-execution engine baked into the service. Each step becomes a durable node. A failure in ai.embed() does not rerun ai.chunk(). If the database restarts mid-backfill, execution resumes from the last checkpoint not row zero.
Step types
| Step | What it does |
|---|---|
ai.chunk() |
Split a text column into overlapping chunks |
ai.embed() |
Generate vector embeddings for a column |
ai.extract() |
Extract structured fields from text using an LLM |
ai.generate() |
Generate text from a prompt template using an LLM |
ai.rank() |
Score or rank documents against a query |
The sink table must declare exactly these five columns: doc_id, chunk_index, chunk_text, embedding, metadata.
-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS pg_durable;
CREATE EXTENSION IF NOT EXISTS azure_ai;
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_diskann;
-- Sink table (five required columns)
CREATE TABLE documents_output (
doc_id INT, chunk_index INT, chunk_text TEXT,
embedding vector(1536), metadata JSONB
);
CREATE INDEX idx_docs_diskann ON documents_output
USING diskann (embedding vector_cosine_ops) WITH (spherical_quantized = true);
-- Declare the pipeline
SELECT ai.create_pipeline(
name => 'docs_rag',
source => ai.table_source(table_name => 'documents', incremental_column => 'updated_at'),
steps => ARRAY[
ai.chunk(input => 'content', chunk_size => 512, overlap => 64),
ai.embed(model => 'default-embedding', input => 'chunk_text', dimensions => 1536)
],
trigger => 'on_change',
sink => ai.table_sink('documents_output')
);
-- Run, monitor, backfill
SELECT ai.run('docs_rag');
SELECT * FROM ai.status('docs_rag');
-- After upgrading embedding model durable re-embed from checkpoint
TRUNCATE documents_output;
SELECT ai.backfill('docs_rag');
-- Pause cost during tuning
SELECT ai.pause('docs_rag');
SELECT ai.resume('docs_rag');
Pipelines run on the primary only. Sources must be HorizonDB tables. Pipeline state is not portable across major pg_durable versions drain or pause before upgrading.
5. BM25 Full-Text Search pg_textsearch
HorizonDB ships pg_textsearch — BM25 as a native Postgres index access method. BM25 is the same relevance algorithm used by Elasticsearch, Solr, and Azure AI Search. Postgres’s built-in ts_rank does not implement it.
Built-in tsvector + ts_rank |
pg_textsearch BM25 |
|
|---|---|---|
| Ranking algorithm | Custom — no term saturation | BM25 — industry standard |
| Term saturation | No — keyword stuffing works | Yes — diminishing returns |
| Length normalization | No | Yes |
| IDF weighting | No | Yes — rare terms up-weighted |
| Top-k pattern | WHERE @@ then sort |
ORDER BY … LIMIT n (index-aware) |
| Index on plain TEXT | Requires tsvector column |
Yes — index plain text directly |
CREATE EXTENSION IF NOT EXISTS pg_textsearch;
CREATE INDEX idx_products_bm25
ON products USING bm25 (description) WITH (text_config = 'english');
-- BM25 ranked retrieval lower score = better match
SELECT id, name,
description <@> to_bm25query('wireless noise cancelling', 'idx_products_bm25') AS score
FROM products
ORDER BY score LIMIT 10;
-- Combined with a structured WHERE filter
SELECT id, name, category,
description <@> to_bm25query('noise cancelling', 'idx_products_bm25') AS score
FROM products
WHERE category = 'audio'
ORDER BY score LIMIT 10;
No native phrase queries. No built-in fuzzy/typo matching (use pg_trgm). Inside PL/pgSQL use explicit to_bm25query(query, index_name).
6. Hybrid Search BM25 + Vector via Reciprocal Rank Fusion
HorizonDB runs both BM25 and vector search in the same query engine, combined with Reciprocal Rank Fusion (RRF):
WITH bm25_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')) AS bm25_rank
FROM documents
ORDER BY description <@> to_bm25query('replication lag', 'idx_docs_bm25')
LIMIT 20
),
vector_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector) AS vec_rank
FROM documents
ORDER BY embedding <=> azure_openai.create_embeddings(input => 'replication lag')::vector
LIMIT 20
)
SELECT COALESCE(b.id, v.id) AS id,
(1.0 / (60 + COALESCE(b.bm25_rank, 999))) +
(1.0 / (60 + COALESCE(v.vec_rank, 999))) AS rrf_score
FROM bm25_results b FULL OUTER JOIN vector_results v ON b.id = v.id
ORDER BY rrf_score DESC LIMIT 10;
Three vector index types available: IVFFlat, HNSW (via pgvector), and DiskANN — Microsoft Research’s graph-based ANN index with spherical quantisation. For production-scale RAG, DiskANN is the recommended choice.
7. Graph Database Apache AGE + openCypher
Apache AGE is included as a first-class extension nodes, edges, and the openCypher query language directly inside Postgres. For AI workloads this enables Graph-RAG: retrieval augmented by structured knowledge graphs, not just vector proximity.
CREATE EXTENSION IF NOT EXISTS AGE CASCADE;
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('product_graph');
-- Query relationships with Cypher
SELECT * FROM cypher('product_graph', $
MATCH (a:Product)-[r:BOUGHT_TOGETHER]->(b:Product)
WHERE a.category = 'audio'
SET a.disp_label = a.title
SET b.disp_label = b.title
RETURN a, r, b
$) AS (a agtype, r agtype, b agtype);
-- Multi-hop traversal
SELECT * FROM cypher('product_graph', $
MATCH (a:Product)-[r1:BOUGHT_TOGETHER]->(mid:Product)-[r2:BOUGHT_TOGETHER]->(b:Product)
RETURN a, r1, mid, r2, b
$) AS (a agtype, r1 agtype, mid agtype, r2 agtype, b agtype);
The PostgreSQL extension for VS Code automatically detects AGE query results and renders them as interactive node-edge graphs. Return full objects (not scalar properties) and set disp_label for readable node labels.
Use cases: knowledge graphs for Graph-RAG, fraud detection, recommendation engines, entity relationship mapping via azure_ai.extract() feeding into AGE.
8. AI Agents HorizonDB as the Backend
HorizonDB is explicitly positioned as the unified data layer for AI agents covering three pillars:
| Agent requirement | HorizonDB provides |
|---|---|
| Memory conversation history, task state | LangGraph PostgreSQL checkpointer · Semantic Kernel PostgreSQL connector |
| Knowledge retrieval grounded answers from business data | pgvector, DiskANN, BM25, Apache AGE, azure_ai functions |
| Multimodal storage JSONB, geospatial, arrays, vectors | All native PostgreSQL types in one engine no separate stores |
| Protocol | Purpose | Governance |
|---|---|---|
| MCP Model Context Protocol | Agent-to-tool connectivity | Linux Foundation · OAuth 2.1 + managed identity |
| A2A Agent-to-Agent Protocol | Peer-to-peer agent delegation | Google (open standard) |
9. Preview Reality Check What’s Missing
| Feature | Status | Impact |
|---|---|---|
| Configurable backup retention | Fixed at 7 days | Compliance blocker for regulated workloads |
| Long-term retention (LTR) | Not available | Compliance blocker |
| Cross-region read replicas | Not available | No geographic DR |
| Customer-managed keys (CMK) | Not available | Service-managed keys only enterprise security blocker |
| PgBouncer / connection pooling | Not available | External pooler required for high-connection apps |
| VNet injection | Not available | Private Endpoints, no VNet integration |
| Configurable maintenance windows | Not available | System-managed upgrade windows |
| AI Model Management (AIMM) | Limited preview-within-preview | Separate approval form required |
| Available regions | 5 only | Central US, West US 2, West US 3, Sweden Central, Australia East |
If your workload requires any of: 30-day backup retention, CMK encryption, cross-region DR, or VNet injection — HorizonDB cannot meet those requirements today. This is a 2027 production service at the current pace.
10. Verdict — Should You Try Azure HorizonDB?
HorizonDB is genuinely novel. The WAL-as-log architecture is not marketing. The AI integration is not a thin wrapper: AI pipelines backed by pg_durable, BM25 as a first-class index access method, Apache AGE with Cypher, and model-registry RBAC in SQL are real engineering decisions.
Try Azure HorizonDB if…
- You are building a new AI-native application on Azure and want to avoid managing separate embedding pipeline services, vector search services, and graph databases.
- You are prototyping a RAG or agentic system and want everything in one Postgres endpoint.
- You want to evaluate the disaggregated architecture before GA the $200 Azure free trial is enough to run the AI pipelines and BM25 examples above on a 2vCore cluster.
- You are Azure-first and want to benchmark the WAL offload architecture against a checkpoint-heavy workload on Flexible Server.
Do not run Azure HorizonDB in production today if…
- You need configurable backup retention, LTR, or CMK encryption.
- Your compliance posture requires cross-region DR.
- Your application uses a high number of short-lived connections (no PgBouncer yet).
- You need your data outside the five available regions.
The bigger picture
Microsoft is betting that the right place for AI workloads is inside the database, not in a separate service tier. If that bet lands, HorizonDB is the product. The WAL-as-log design removes the checkpoint I/O bottlenecks. The AI function layer removes the complexity of coordinating your database with an external LLM service. The durable pipelines solve the exact failure modes anyone who has built a batch embedding job at scale has encountered.
Whether it justifies a 3× compute premium over Flexible Server depends on how much of that AI stack you are currently building and operating yourself. If the answer is “a lot,” the math changes.
Public preview no waitlist for the core service.
AIMM requires a separate approval form. ·
Pricing page ·
Documentation
All SQL examples sourced from official Microsoft Learn docs, last retrieved June 8, 2026. Pricing valid as of the same date.
Leave a Reply