Skip to content

Store: Azure SQL / Postgres pgvector

Key Points

  • Postgres + pgvector: open-source extension; mature; widely deployed.
  • Azure SQL (2025+): native vector type; HNSW indexing.
  • When: existing relational DB; want vectors collocated with app data; transactional consistency.
  • Trade-off: less specialized than Azure AI Search / Qdrant; missing some hybrid features.

Postgres + pgvector

Setup

CREATE EXTENSION vector;

CREATE TABLE chunks (
    id UUID PRIMARY KEY,
    document_id UUID NOT NULL,
    text TEXT,
    embedding vector(1536)
);

CREATE INDEX chunks_embedding_idx ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX chunks_document_idx ON chunks (document_id);

.NET integration

// Microsoft.Extensions.VectorData.PostgresPgVector
var collection = new PostgresCollection<Guid, Chunk>(connStr, "chunks");

Query

await foreach (var r in collection.SearchAsync(queryEmb, top: 10))
{ /* ... */ }

Direct SQL alternative

await using var conn = new NpgsqlConnection(connStr);
await conn.OpenAsync();

await using var cmd = new NpgsqlCommand(
    "SELECT id, text FROM chunks ORDER BY embedding <=> @query LIMIT 10",
    conn);
cmd.Parameters.AddWithValue("query", queryEmb.Vector.ToArray());

await using var rd = await cmd.ExecuteReaderAsync();
while (await rd.ReadAsync()) { /* ... */ }

<=> is cosine distance operator. <-> is L2; <#> is dot product.

EF Core

EF Core 8+ supports JSON columns; vector via raw SQL or Pgvector.EntityFrameworkCore:

modelBuilder.Entity<Chunk>().Property(c => c.Embedding).HasColumnType("vector(1536)");

Tuning HNSW

SET hnsw.ef_search = 100;   -- balance recall vs latency

For huge datasets: increase ef_construct (build slower; better quality).

Azure SQL native vector

CREATE TABLE Chunks (
    Id UNIQUEIDENTIFIER PRIMARY KEY,
    Text NVARCHAR(MAX),
    Embedding VECTOR(1536)
);

CREATE INDEX IX_Embedding ON Chunks(Embedding) USING DISKANN;   -- or HNSW
SELECT TOP 10 Id, Text
FROM Chunks
ORDER BY VECTOR_DISTANCE('cosine', Embedding, @queryEmb);
// Azure SQL with Microsoft.Data.SqlClient

Microsoft.Extensions.VectorData has Azure SQL provider.

Trade-offs

Aspect Pros Cons
pgvector Same DB as app; transactional; mature Slower at scale than dedicated
Azure SQL vector Same as above + Azure-native Newer; ecosystem developing
Azure AI Search Best hybrid retrieval Separate service
Qdrant Specialized fast Operational

For up to ~10M vectors with simple query patterns: relational + vector works well.

Hybrid

-- Postgres: combine FTS + vector
SELECT *,
       (1.0 - (embedding <=> @q)) * 0.7 +
       ts_rank(to_tsvector('english', text), to_tsquery('english', @q_tokens)) * 0.3 AS score
FROM chunks
ORDER BY score DESC
LIMIT 10;

Manual fusion. Less polished than dedicated systems.

Multi-tenant

Add tenant_id column; filter on every query; index on (tenant_id, embedding).

Pros of relational

  • Single DB for app + vectors.
  • ACID transactions.
  • Familiar SQL operations.
  • Existing backup / DR.
  • Cost: incremental on existing DB.

Cons

  • Slower at very high QPS.
  • Less specialized index features.
  • Hybrid manual.

When use

  • Existing Postgres / SQL Server app.
  • Modest vector count (<10M).
  • Want transactional consistency (vectors + relational data atomic).
  • Cost-conscious; one DB.

When NOT

  • Specialized RAG at scale.
  • Need best-in-class hybrid.
  • Need vector-specific features (quantization, etc.).

Senior considerations

  • Index strategy: HNSW vs IVF; tune ef.
  • Storage: vector(1536) ~6 KB; multiply by row count.
  • Backup: standard DB backups handle vectors.
  • Migration: DB-level operations work normally.

Cross-references