Microsoft.Extensions.VectorData
Key Points
Microsoft.Extensions.VectorData= unified abstraction over vector stores (Azure AI Search, Cosmos, Qdrant, Postgres pgvector, in-memory, Redis).- Define a record with
[VectorStoreKey],[VectorStoreData],[VectorStoreVector]attributes; same DTO across providers. - Swap providers via DI / config.
- Aligns with
Microsoft.Extensions.AIfor embeddings.
Define a record
public class HotelInfo
{
[VectorStoreKey]
public ulong HotelId { get; set; }
[VectorStoreData(IsIndexed = true)]
public string HotelName { get; set; } = "";
[VectorStoreData]
public string Description { get; set; } = "";
[VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }
}
[VectorStoreData(IsIndexed = true)] marks fields for keyword/filter index. [VectorStoreVector] declares the vector + dim + distance function.
Connect to a store
// Azure AI Search
var collection = new AzureAISearchCollection<ulong, HotelInfo>(
new SearchIndexClient(uri, cred),
"hotels");
// Cosmos DB NoSQL
var cosmosCol = new CosmosNoSqlCollection<ulong, HotelInfo>(database, "hotels");
// Qdrant
var qdrant = new QdrantCollection<ulong, HotelInfo>(qdrantClient, "hotels");
// In-memory (testing)
var inMem = new InMemoryCollection<ulong, HotelInfo>("hotels");
await collection.EnsureCollectionExistsAsync();
Upsert
await collection.UpsertAsync(new HotelInfo
{
HotelId = 1,
HotelName = "Test Hotel",
Description = "...",
DescriptionEmbedding = await _embed.GenerateAsync(description).Value
});
Search
var queryEmb = await _embed.GenerateAsync(query);
await foreach (var record in collection.SearchAsync(queryEmb.Vector, top: 10))
{
Console.WriteLine($"{record.Record.HotelName} (score: {record.Score})");
}
Filtering
await foreach (var r in collection.SearchAsync(queryEmb.Vector, top: 10, new VectorSearchOptions<HotelInfo>
{
Filter = h => h.HotelName.Contains("Marriott")
}))
{ /* ... */ }
LINQ-like predicate; translated to provider's filter syntax.
DI
builder.Services.AddAzureAISearchCollection<HotelInfo>(
"hotels",
sp => new SearchIndexClient(uri, cred));
Multiple providers
Same record class; different providers via DI config. Swap by changing one line.
Distance functions
DistanceFunction.CosineSimilarity // default for text embeddings
DistanceFunction.DotProductSimilarity
DistanceFunction.EuclideanDistance
Indexing
For hybrid search (vector + keyword):
[VectorStoreData(IsFullTextIndexed = true)] // keyword index
public string Description { get; set; } = "";
Provider varies in features.
Hybrid search
var hybrid = await collection.HybridSearchAsync(
new HybridSearchRequest<HotelInfo>
{
VectorQuery = queryEmb.Vector,
KeywordQuery = "spa",
Top = 10
});
Vector + keyword fused. Best quality. Provider-dependent (Azure AI Search excellent here).
Schema migration
When changing model: - Add new fields → backwards-compatible. - Change embedding dim → re-embed everything. - Change PK type → recreate index.
Plan version.
Senior considerations
- Pick provider for your needs: Azure AI Search (hybrid), Cosmos (collocated with app data), Qdrant (open-source perf), pgvector (Postgres-native).
- Index design: filter fields, full-text fields balance write cost vs read.
- Batch upsert: 100-1000 per request typical.
- Track embedding model version with records.