Skip to content

Store: Azure AI Search

Key Points

  • Azure AI Search (formerly Cognitive Search) — managed search engine on Azure.
  • Hybrid search: vector + keyword + filtering. Best-in-class retrieval quality.
  • Semantic ranker: Microsoft's reranker baked in.
  • Tight integration with Microsoft.Extensions.VectorData.
  • Cost: tiered (Basic, Standard, Storage Optimized). Storage + compute units.

Setup

var searchClient = new SearchIndexClient(
    new Uri("https://my-search.search.windows.net"),
    new DefaultAzureCredential());

var collection = new AzureAISearchCollection<string, MyDocument>(searchClient, "my-index");

await collection.EnsureCollectionExistsAsync();

Record

public class MyDocument
{
    [VectorStoreKey]
    public string Id { get; set; } = "";

    [VectorStoreData(IsIndexed = true, IsFullTextIndexed = true)]
    public string Title { get; set; } = "";

    [VectorStoreData(IsFullTextIndexed = true)]
    public string Content { get; set; } = "";

    [VectorStoreData(IsIndexed = true)]
    public string Category { get; set; } = "";

    [VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float>? Embedding { get; set; }
}

IsFullTextIndexed = true enables BM25 keyword search alongside vector.

var hybridResults = collection.HybridSearchAsync(
    new HybridSearchRequest<MyDocument>
    {
        VectorQuery = queryEmbedding,
        KeywordQuery = userQuery,
        Top = 10,
        Filter = d => d.Category == "Engineering"
    });

await foreach (var r in hybridResults)
    Console.WriteLine($"{r.Record.Title}: {r.Score}");

Vector + BM25 fused via reciprocal rank fusion. Best general-purpose retrieval.

Semantic ranker

Azure AI Search's built-in semantic reranker uses LLM-trained reranker:

new SearchOptions
{
    QueryType = SearchQueryType.Semantic,
    SemanticSearch = new() { SemanticConfigurationName = "default" }
};

Improves top-K quality further. Premium tier feature.

Faceted filtering

new SearchOptions
{
    Facets = { "Category", "Author" }
};

Returns counts per facet — populate filter UI.

Index design

  • Searchable: full-text search.
  • Filterable: equality / range filters.
  • Sortable: ORDER BY.
  • Facetable: aggregations.
[VectorStoreData(IsIndexed = true)]   // filterable
[VectorStoreData(IsFullTextIndexed = true)]   // searchable

Tiers

Tier Notes
Free Demos
Basic Small prod
S1, S2, S3 Standard production
Storage Optimized (L1, L2) Large datasets

Vector compression

Azure AI Search supports binary / scalar / matryoshka compression — reduces storage. Trade quality.

Indexer (built-in ingestion)

Azure AI Search has built-in indexers from Blob, SQL, Cosmos, etc. Auto-incremental.

resource indexer 'Microsoft.Search/searchServices/indexers@...' = { ... }

For most apps: do ingestion in your code (more control). Use indexer for simple cases.

Auth

  • API key (admin/query).
  • Managed identity (preferred).
new SearchIndexClient(uri, new DefaultAzureCredential())

Pricing

  • Search units (compute).
  • Storage.
  • Documents (limit).

For 1M chunks: Standard S1 typical (~$250/mo).

Query path (hybrid + semantic ranker)

 user query (string)
        ├──► IEmbeddingGenerator ──► float[1536]  ┐
        │                                         │
        └──► tokenizer / analyzer (BM25)          │
                       │                          │
                       ▼                          ▼
              keyword query (top 50)    vector query (k=50, HNSW)
                       │                         │
                       └───────► RRF fuse ◄──────┘
                                  │  (k=60 constant)
                        Fused L1 top-50 candidates
                Semantic ranker (L2, Bing model)
                - rerankerScore 0.00 - 4.00
                - captions + answers + highlights
                     top-K (typ. 3-5) ──► LLM prompt

IsFullTextIndexed = true populates the BM25 side; [VectorStoreVector] feeds the HNSW side. Filters (IsIndexed = true) are applied pre- or post-vector depending on vectorFilterMode.

Index field anatomy

+--------------------------------------------------------------+
| Index: docs-v3                                               |
+----------------+--------+--------+--------+--------+---------+
| Field          | search | filter | facet  | sort   | vector  |
+----------------+--------+--------+--------+--------+---------+
| Id   (key)     |   -    |   Y    |   -    |   -    |    -    |
| Title          |   Y    |   Y    |   -    |   Y    |    -    |
| Content        |   Y    |   -    |   -    |   -    |    -    |
| Category       |   -    |   Y    |   Y    |   -    |    -    |
| TenantId       |   -    |   Y    |   -    |   -    |    -    |
| UpdatedAt      |   -    |   Y    |   -    |   Y    |    -    |
| Embedding      |   -    |   -    |   -    |   -    |  1536f  |
+----------------+--------+--------+--------+--------+---------+

Each true flag costs storage and write amplification — index only what you will actually query.

Pros & cons

Aspect Verdict
Hybrid + semantic ranker quality Best-in-class on Azure; benchmark-proven
Operational maturity GA, SLA, Entra ID, Private Link, diagnostics
Indexer + integrated vectorization Pull-mode ingestion from Blob/Cosmos/SQL out of the box
Cost floor No usable free-tier prod; Basic+ required at small scale
Vendor lock-in Azure-only; no portable export of HNSW graph
Schema rigidity Index changes often require rebuild; plan a versioned alias

When to use / when to avoid

Use when: - You are already on Azure and need Entra ID / Private Link / RBAC. - You need hybrid (BM25 + vector) with a managed reranker out of the box. - Multi-tenant SaaS with index-per-tenant or partition-key filtering. - Compliance demands an audited PaaS rather than self-hosted Qdrant/Weaviate.

Avoid when: - Tiny corpora (< 50k chunks) — pgvector or in-process FAISS is cheaper. - You need exotic vector ops (sparse-only, multi-vector ColBERT) not yet GA. - You want a single OLTP+vector store — Cosmos DB or Postgres is simpler. - Hard cost ceiling under ~$80/mo — Basic tier is the realistic floor.

Interview Q&A

1. When is hybrid search strictly better than vector-only? When queries contain exact tokens the embedding model under-weights — IDs, SKUs, error codes, statute references, code symbols. BM25 nails those; the vector arm catches paraphrase. Microsoft's own benchmarks show hybrid + semantic ranker beating pure vector on most enterprise corpora. The cost is one extra parallel query and an RRF merge — effectively free.

2. Semantic ranker vs Cohere Rerank — which would you pick on Azure? Default to the built-in semantic ranker: it ships with the service, is billed per query unit, runs in-region (data residency, no egress), and is trained on the Bing corpus with broad multilingual coverage. Reach for Cohere rerank-multilingual-v3.0 only when you have measured a quality gap on your eval set, need cross-lingual reranking the semantic ranker handles poorly, or want to stay portable across vector stores.

3. Indexer (pull) vs custom push ingestion — when do you choose which? Use indexers when the source is Blob/Cosmos/SQL/OneLake, change-tracking fits the built-in watermark, and you can express transformations as skillsets. Use push ingestion (UpsertAsync from your own pipeline) when you need streaming, exactly-once semantics, custom chunking, dead-letter queues, or feature-flagged rollouts. Most production RAG I have shipped ends up push-based because chunking strategy is the moat.

4. How does scalar vs binary quantization change the trade-off? Scalar (int8) gives ~4x reduction with negligible recall loss — turn it on by default. Binary (1 bit) gives up to ~28x but only behaves well on dimensions ≥ 1024 and embeddings centered at zero (OpenAI text-embedding-3, Cohere). Both want oversampling + rescore against the original full-precision vector to recover recall — keep defaultOversampling ≥ 4 unless you measured otherwise.

5. Multi-tenant: index-per-tenant, field filter, or service-per-tenant? Three valid patterns. Index-per-tenant gives the strongest isolation and per-tenant schema evolution but caps at the service's index limit. A shared index with TenantId as a filterable field scales to millions of tenants and is cheapest, but you must enforce the filter in every query (security trimming) and tune for hot tenants. Service-per-tenant is for regulated workloads needing physical isolation. Default to shared-index + filter; promote noisy neighbours to their own index.

6. What is your reindex / freshness strategy when the embedding model changes? Treat the embedding model version as part of the index identity. Build docs-v3-2026-04 alongside the live docs-v2-2026-01, dual-write during backfill, swap an alias at the end, then drop the old index. Never reuse a field name with mixed embedding versions — vector spaces are not comparable across models. For trickle freshness, push updates with a version/etag field and use the indexer's high-water mark.

7. How do you secure an Azure AI Search index in 2026 best-practice? Disable API keys, enable RBAC-only (Search Service Contributor, Search Index Data Contributor, Search Index Data Reader), connect via DefaultAzureCredential with a user-assigned managed identity, put the service behind Private Link, and use document-level access control via a filterable acl field populated by your auth tier. Audit through Diagnostic Settings to Log Analytics.

8. RRF score is 0.03 — is the result bad? No. RRF scores are bounded by 1/(k+rank) summed across query arms, so they look tiny next to BM25 (unbounded) or HNSW cosine (0.33-1.0). Compare RRF scores only against other RRF scores from the same query. To present to users or to threshold, use @search.rerankerScore (0.00-4.00) from the semantic ranker — that range is calibrated.

9. When does the semantic ranker hurt rather than help? On highly structured, code-like, or numeric content the L2 model hasn't seen — log lines, financial tables, rare-language docs. It also caps inputs at ~8,960 tokens, so very long fields get truncated. And it only reranks the top 50 from L1, so if your L1 retrieval misses the gold doc, L2 can't recover it — invest in recall@50 first.

10. Vector compression — when is it a footgun? Binary quantization on small dimensions (< 1024) or non-zero-centered embeddings can drop recall by 10-20%. Scalar is safer. The non-obvious one: rescoreStorageMethod = discardOriginals saves ~50% storage but disables rescoring forever — you cannot turn it back on without a full reindex. For binary, discardOriginals is fine (dot-product rescore works); for scalar, keep originals.

11. How would you design an A/B test between two retrieval pipelines? Build a labeled eval set first (queries + ideal doc IDs from your domain experts). Run both pipelines headless, compute recall@10, MRR, nDCG, and ranker-agreement. Then ship behind a flag, log the served pipeline per query, and watch downstream signals — answer faithfulness, click-through, follow-up rate. Never trust offline metrics alone; never trust online metrics on < ~500 queries.

Gotchas / common mistakes

  • Forgetting IsFullTextIndexed = true on the field you want BM25 on — hybrid silently degrades to vector-only on that arm.
  • Mutating an index schema in place. Most field-attribute changes require a rebuild. Always work behind an alias and a versioned index name.
  • Putting tenant filtering only in the client. A bug in one query path leaks data across tenants. Enforce at the index layer with a security-trim filter and a unit test.
  • Using API keys in production. Switch the service to RBAC-only and authenticate with DefaultAzureCredential; rotate any leaked admin key immediately.
  • Ignoring maxTextRecallSize and top interaction with semantic ranker. L2 needs ~50 candidates from L1 to do its job; setting top: 5 with no k override starves it.
  • Mixing embedding model versions in the same vector field. The space is incoherent and recall collapses.
  • Treating RRF scores as probabilities. They are not calibrated and not comparable across queries. Threshold on rerankerScore instead.
  • Skipping eval. "Hybrid + semantic" is a strong default, not a guarantee — run Ragas / your own labeled set on every pipeline change.

Senior considerations

  • Default to hybrid + semantic ranker for quality.
  • Index only what you'll filter on — write cost.
  • Replicas for read scale; partitions for storage.
  • Embeddings stay in sync with index — re-index on model upgrade.

Comparison

Store Pros Cons
Azure AI Search Hybrid + semantic ranker; Azure-native Azure-only; cost
Cosmos DB Same DB as app Less hybrid maturity
Qdrant Open; fast Self-host or cloud
pgvector Postgres-native Limited filtering

For Azure shops + enterprise RAG: Azure AI Search is the default.

Further reading

API note: the Microsoft.Extensions.VectorData surface is still moving — older Semantic Kernel docs reference AzureAISearchVectorStoreRecordCollection<TRecord> while newer abstractions package (v10.1+) uses VectorStoreCollection<TKey,TRecord> with IKeywordHybridSearchable<TRecord>. Verify the exact type names against the version you have referenced before copying snippets. (verify on Microsoft Learn)


Cross-references