Store: Azure SQL / Postgres pgvector
Key Points
- Postgres + pgvector: open-source extension; mature; widely deployed.
- Azure SQL (2025+): native vector type; HNSW indexing.
- When: existing relational DB; want vectors collocated with app data; transactional consistency.
- Trade-off: less specialized than Azure AI Search / Qdrant; missing some hybrid features.
Postgres + pgvector
Setup
CREATE EXTENSION vector;
CREATE TABLE chunks (
id UUID PRIMARY KEY,
document_id UUID NOT NULL,
text TEXT,
embedding vector(1536)
);
CREATE INDEX chunks_embedding_idx ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX chunks_document_idx ON chunks (document_id);
.NET integration
// Microsoft.Extensions.VectorData.PostgresPgVector
var collection = new PostgresCollection<Guid, Chunk>(connStr, "chunks");
Query
Direct SQL alternative
await using var conn = new NpgsqlConnection(connStr);
await conn.OpenAsync();
await using var cmd = new NpgsqlCommand(
"SELECT id, text FROM chunks ORDER BY embedding <=> @query LIMIT 10",
conn);
cmd.Parameters.AddWithValue("query", queryEmb.Vector.ToArray());
await using var rd = await cmd.ExecuteReaderAsync();
while (await rd.ReadAsync()) { /* ... */ }
<=> is cosine distance operator. <-> is L2; <#> is dot product.
EF Core
EF Core 8+ supports JSON columns; vector via raw SQL or Pgvector.EntityFrameworkCore:
Tuning HNSW
For huge datasets: increase ef_construct (build slower; better quality).
Azure SQL native vector
CREATE TABLE Chunks (
Id UNIQUEIDENTIFIER PRIMARY KEY,
Text NVARCHAR(MAX),
Embedding VECTOR(1536)
);
CREATE INDEX IX_Embedding ON Chunks(Embedding) USING DISKANN; -- or HNSW
Microsoft.Extensions.VectorData has Azure SQL provider.
Trade-offs
| Aspect | Pros | Cons |
|---|---|---|
| pgvector | Same DB as app; transactional; mature | Slower at scale than dedicated |
| Azure SQL vector | Same as above + Azure-native | Newer; ecosystem developing |
| Azure AI Search | Best hybrid retrieval | Separate service |
| Qdrant | Specialized fast | Operational |
For up to ~10M vectors with simple query patterns: relational + vector works well.
Hybrid
-- Postgres: combine FTS + vector
SELECT *,
(1.0 - (embedding <=> @q)) * 0.7 +
ts_rank(to_tsvector('english', text), to_tsquery('english', @q_tokens)) * 0.3 AS score
FROM chunks
ORDER BY score DESC
LIMIT 10;
Manual fusion. Less polished than dedicated systems.
Multi-tenant
Add tenant_id column; filter on every query; index on (tenant_id, embedding).
Pros of relational
- Single DB for app + vectors.
- ACID transactions.
- Familiar SQL operations.
- Existing backup / DR.
- Cost: incremental on existing DB.
Cons
- Slower at very high QPS.
- Less specialized index features.
- Hybrid manual.
When use
- Existing Postgres / SQL Server app.
- Modest vector count (<10M).
- Want transactional consistency (vectors + relational data atomic).
- Cost-conscious; one DB.
When NOT
- Specialized RAG at scale.
- Need best-in-class hybrid.
- Need vector-specific features (quantization, etc.).
Senior considerations
- Index strategy: HNSW vs IVF; tune
ef. - Storage: vector(1536) ~6 KB; multiply by row count.
- Backup: standard DB backups handle vectors.
- Migration: DB-level operations work normally.