Store: Azure AI Search
Key Points
- Azure AI Search (formerly Cognitive Search) — managed search engine on Azure.
- Hybrid search: vector + keyword + filtering. Best-in-class retrieval quality.
- Semantic ranker: Microsoft's reranker baked in.
- Tight integration with Microsoft.Extensions.VectorData.
- Cost: tiered (Basic, Standard, Storage Optimized). Storage + compute units.
Setup
var searchClient = new SearchIndexClient(
new Uri("https://my-search.search.windows.net"),
new DefaultAzureCredential());
var collection = new AzureAISearchCollection<string, MyDocument>(searchClient, "my-index");
await collection.EnsureCollectionExistsAsync();
Record
public class MyDocument
{
[VectorStoreKey]
public string Id { get; set; } = "";
[VectorStoreData(IsIndexed = true, IsFullTextIndexed = true)]
public string Title { get; set; } = "";
[VectorStoreData(IsFullTextIndexed = true)]
public string Content { get; set; } = "";
[VectorStoreData(IsIndexed = true)]
public string Category { get; set; } = "";
[VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
public ReadOnlyMemory<float>? Embedding { get; set; }
}
IsFullTextIndexed = true enables BM25 keyword search alongside vector.
Hybrid search
var hybridResults = collection.HybridSearchAsync(
new HybridSearchRequest<MyDocument>
{
VectorQuery = queryEmbedding,
KeywordQuery = userQuery,
Top = 10,
Filter = d => d.Category == "Engineering"
});
await foreach (var r in hybridResults)
Console.WriteLine($"{r.Record.Title}: {r.Score}");
Vector + BM25 fused via reciprocal rank fusion. Best general-purpose retrieval.
Semantic ranker
Azure AI Search's built-in semantic reranker uses LLM-trained reranker:
new SearchOptions
{
QueryType = SearchQueryType.Semantic,
SemanticSearch = new() { SemanticConfigurationName = "default" }
};
Improves top-K quality further. Premium tier feature.
Faceted filtering
Returns counts per facet — populate filter UI.
Index design
- Searchable: full-text search.
- Filterable: equality / range filters.
- Sortable: ORDER BY.
- Facetable: aggregations.
[VectorStoreData(IsIndexed = true)] // filterable
[VectorStoreData(IsFullTextIndexed = true)] // searchable
Tiers
| Tier | Notes |
|---|---|
| Free | Demos |
| Basic | Small prod |
| S1, S2, S3 | Standard production |
| Storage Optimized (L1, L2) | Large datasets |
Vector compression
Azure AI Search supports binary / scalar / matryoshka compression — reduces storage. Trade quality.
Indexer (built-in ingestion)
Azure AI Search has built-in indexers from Blob, SQL, Cosmos, etc. Auto-incremental.
For most apps: do ingestion in your code (more control). Use indexer for simple cases.
Auth
- API key (admin/query).
- Managed identity (preferred).
Pricing
- Search units (compute).
- Storage.
- Documents (limit).
For 1M chunks: Standard S1 typical (~$250/mo).
Query path (hybrid + semantic ranker)
user query (string)
│
├──► IEmbeddingGenerator ──► float[1536] ┐
│ │
└──► tokenizer / analyzer (BM25) │
│ │
▼ ▼
keyword query (top 50) vector query (k=50, HNSW)
│ │
└───────► RRF fuse ◄──────┘
│ (k=60 constant)
▼
Fused L1 top-50 candidates
│
▼
Semantic ranker (L2, Bing model)
- rerankerScore 0.00 - 4.00
- captions + answers + highlights
│
▼
top-K (typ. 3-5) ──► LLM prompt
IsFullTextIndexed = true populates the BM25 side; [VectorStoreVector] feeds the HNSW side. Filters (IsIndexed = true) are applied pre- or post-vector depending on vectorFilterMode.
Index field anatomy
+--------------------------------------------------------------+
| Index: docs-v3 |
+----------------+--------+--------+--------+--------+---------+
| Field | search | filter | facet | sort | vector |
+----------------+--------+--------+--------+--------+---------+
| Id (key) | - | Y | - | - | - |
| Title | Y | Y | - | Y | - |
| Content | Y | - | - | - | - |
| Category | - | Y | Y | - | - |
| TenantId | - | Y | - | - | - |
| UpdatedAt | - | Y | - | Y | - |
| Embedding | - | - | - | - | 1536f |
+----------------+--------+--------+--------+--------+---------+
Each true flag costs storage and write amplification — index only what you will actually query.
Pros & cons
| Aspect | Verdict |
|---|---|
| Hybrid + semantic ranker quality | Best-in-class on Azure; benchmark-proven |
| Operational maturity | GA, SLA, Entra ID, Private Link, diagnostics |
| Indexer + integrated vectorization | Pull-mode ingestion from Blob/Cosmos/SQL out of the box |
| Cost floor | No usable free-tier prod; Basic+ required at small scale |
| Vendor lock-in | Azure-only; no portable export of HNSW graph |
| Schema rigidity | Index changes often require rebuild; plan a versioned alias |
When to use / when to avoid
Use when: - You are already on Azure and need Entra ID / Private Link / RBAC. - You need hybrid (BM25 + vector) with a managed reranker out of the box. - Multi-tenant SaaS with index-per-tenant or partition-key filtering. - Compliance demands an audited PaaS rather than self-hosted Qdrant/Weaviate.
Avoid when: - Tiny corpora (< 50k chunks) — pgvector or in-process FAISS is cheaper. - You need exotic vector ops (sparse-only, multi-vector ColBERT) not yet GA. - You want a single OLTP+vector store — Cosmos DB or Postgres is simpler. - Hard cost ceiling under ~$80/mo — Basic tier is the realistic floor.
Interview Q&A
1. When is hybrid search strictly better than vector-only? When queries contain exact tokens the embedding model under-weights — IDs, SKUs, error codes, statute references, code symbols. BM25 nails those; the vector arm catches paraphrase. Microsoft's own benchmarks show hybrid + semantic ranker beating pure vector on most enterprise corpora. The cost is one extra parallel query and an RRF merge — effectively free.
2. Semantic ranker vs Cohere Rerank — which would you pick on Azure? Default to the built-in semantic ranker: it ships with the service, is billed per query unit, runs in-region (data residency, no egress), and is trained on the Bing corpus with broad multilingual coverage. Reach for Cohere rerank-multilingual-v3.0 only when you have measured a quality gap on your eval set, need cross-lingual reranking the semantic ranker handles poorly, or want to stay portable across vector stores.
3. Indexer (pull) vs custom push ingestion — when do you choose which? Use indexers when the source is Blob/Cosmos/SQL/OneLake, change-tracking fits the built-in watermark, and you can express transformations as skillsets. Use push ingestion (UpsertAsync from your own pipeline) when you need streaming, exactly-once semantics, custom chunking, dead-letter queues, or feature-flagged rollouts. Most production RAG I have shipped ends up push-based because chunking strategy is the moat.
4. How does scalar vs binary quantization change the trade-off? Scalar (int8) gives ~4x reduction with negligible recall loss — turn it on by default. Binary (1 bit) gives up to ~28x but only behaves well on dimensions ≥ 1024 and embeddings centered at zero (OpenAI text-embedding-3, Cohere). Both want oversampling + rescore against the original full-precision vector to recover recall — keep defaultOversampling ≥ 4 unless you measured otherwise.
5. Multi-tenant: index-per-tenant, field filter, or service-per-tenant? Three valid patterns. Index-per-tenant gives the strongest isolation and per-tenant schema evolution but caps at the service's index limit. A shared index with TenantId as a filterable field scales to millions of tenants and is cheapest, but you must enforce the filter in every query (security trimming) and tune for hot tenants. Service-per-tenant is for regulated workloads needing physical isolation. Default to shared-index + filter; promote noisy neighbours to their own index.
6. What is your reindex / freshness strategy when the embedding model changes? Treat the embedding model version as part of the index identity. Build docs-v3-2026-04 alongside the live docs-v2-2026-01, dual-write during backfill, swap an alias at the end, then drop the old index. Never reuse a field name with mixed embedding versions — vector spaces are not comparable across models. For trickle freshness, push updates with a version/etag field and use the indexer's high-water mark.
7. How do you secure an Azure AI Search index in 2026 best-practice? Disable API keys, enable RBAC-only (Search Service Contributor, Search Index Data Contributor, Search Index Data Reader), connect via DefaultAzureCredential with a user-assigned managed identity, put the service behind Private Link, and use document-level access control via a filterable acl field populated by your auth tier. Audit through Diagnostic Settings to Log Analytics.
8. RRF score is 0.03 — is the result bad? No. RRF scores are bounded by 1/(k+rank) summed across query arms, so they look tiny next to BM25 (unbounded) or HNSW cosine (0.33-1.0). Compare RRF scores only against other RRF scores from the same query. To present to users or to threshold, use @search.rerankerScore (0.00-4.00) from the semantic ranker — that range is calibrated.
9. When does the semantic ranker hurt rather than help? On highly structured, code-like, or numeric content the L2 model hasn't seen — log lines, financial tables, rare-language docs. It also caps inputs at ~8,960 tokens, so very long fields get truncated. And it only reranks the top 50 from L1, so if your L1 retrieval misses the gold doc, L2 can't recover it — invest in recall@50 first.
10. Vector compression — when is it a footgun? Binary quantization on small dimensions (< 1024) or non-zero-centered embeddings can drop recall by 10-20%. Scalar is safer. The non-obvious one: rescoreStorageMethod = discardOriginals saves ~50% storage but disables rescoring forever — you cannot turn it back on without a full reindex. For binary, discardOriginals is fine (dot-product rescore works); for scalar, keep originals.
11. How would you design an A/B test between two retrieval pipelines? Build a labeled eval set first (queries + ideal doc IDs from your domain experts). Run both pipelines headless, compute recall@10, MRR, nDCG, and ranker-agreement. Then ship behind a flag, log the served pipeline per query, and watch downstream signals — answer faithfulness, click-through, follow-up rate. Never trust offline metrics alone; never trust online metrics on < ~500 queries.
Gotchas / common mistakes
- Forgetting
IsFullTextIndexed = trueon the field you want BM25 on — hybrid silently degrades to vector-only on that arm. - Mutating an index schema in place. Most field-attribute changes require a rebuild. Always work behind an alias and a versioned index name.
- Putting tenant filtering only in the client. A bug in one query path leaks data across tenants. Enforce at the index layer with a security-trim filter and a unit test.
- Using API keys in production. Switch the service to RBAC-only and authenticate with
DefaultAzureCredential; rotate any leaked admin key immediately. - Ignoring
maxTextRecallSizeandtopinteraction with semantic ranker. L2 needs ~50 candidates from L1 to do its job; settingtop: 5with nokoverride starves it. - Mixing embedding model versions in the same vector field. The space is incoherent and recall collapses.
- Treating RRF scores as probabilities. They are not calibrated and not comparable across queries. Threshold on
rerankerScoreinstead. - Skipping eval. "Hybrid + semantic" is a strong default, not a guarantee — run Ragas / your own labeled set on every pipeline change.
Senior considerations
- Default to hybrid + semantic ranker for quality.
- Index only what you'll filter on — write cost.
- Replicas for read scale; partitions for storage.
- Embeddings stay in sync with index — re-index on model upgrade.
Comparison
| Store | Pros | Cons |
|---|---|---|
| Azure AI Search | Hybrid + semantic ranker; Azure-native | Azure-only; cost |
| Cosmos DB | Same DB as app | Less hybrid maturity |
| Qdrant | Open; fast | Self-host or cloud |
| pgvector | Postgres-native | Limited filtering |
For Azure shops + enterprise RAG: Azure AI Search is the default.
Further reading
- Vector search in Azure AI Search (overview)
- Create a hybrid query in Azure AI Search
- Relevance scoring with Reciprocal Rank Fusion (RRF)
- Semantic ranking in Azure AI Search
- Compress vectors using scalar or binary quantization
- Connect to Azure AI Search using roles (RBAC)
- Connect your app to Azure AI Search using identities
- Design patterns for multitenant SaaS applications and Azure AI Search
- Microsoft.Extensions.VectorData abstractions (
IKeywordHybridSearchable<T>) - Vector search using vector store providers (.NET AI)
API note: the
Microsoft.Extensions.VectorDatasurface is still moving — older Semantic Kernel docs referenceAzureAISearchVectorStoreRecordCollection<TRecord>while newer abstractions package (v10.1+) usesVectorStoreCollection<TKey,TRecord>withIKeywordHybridSearchable<TRecord>. Verify the exact type names against the version you have referenced before copying snippets. (verify on Microsoft Learn)