Microsoft.Extensions.VectorData

Key Points

Microsoft.Extensions.VectorData = unified abstraction over vector stores (Azure AI Search, Cosmos, Qdrant, Postgres pgvector, in-memory, Redis).
Define a record with [VectorStoreKey], [VectorStoreData], [VectorStoreVector] attributes; same DTO across providers.
Swap providers via DI / config.
Aligns with Microsoft.Extensions.AI for embeddings.

Define a record

public class HotelInfo
{
    [VectorStoreKey]
    public ulong HotelId { get; set; }

    [VectorStoreData(IsIndexed = true)]
    public string HotelName { get; set; } = "";

    [VectorStoreData]
    public string Description { get; set; } = "";

    [VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }
}

[VectorStoreData(IsIndexed = true)] marks fields for keyword/filter index. [VectorStoreVector] declares the vector + dim + distance function.

Connect to a store

// Azure AI Search
var collection = new AzureAISearchCollection<ulong, HotelInfo>(
    new SearchIndexClient(uri, cred),
    "hotels");

// Cosmos DB NoSQL
var cosmosCol = new CosmosNoSqlCollection<ulong, HotelInfo>(database, "hotels");

// Qdrant
var qdrant = new QdrantCollection<ulong, HotelInfo>(qdrantClient, "hotels");

// In-memory (testing)
var inMem = new InMemoryCollection<ulong, HotelInfo>("hotels");

await collection.EnsureCollectionExistsAsync();

Upsert

await collection.UpsertAsync(new HotelInfo
{
    HotelId = 1,
    HotelName = "Test Hotel",
    Description = "...",
    DescriptionEmbedding = await _embed.GenerateAsync(description).Value
});

Search

var queryEmb = await _embed.GenerateAsync(query);

await foreach (var record in collection.SearchAsync(queryEmb.Vector, top: 10))
{
    Console.WriteLine($"{record.Record.HotelName} (score: {record.Score})");
}

Filtering

await foreach (var r in collection.SearchAsync(queryEmb.Vector, top: 10, new VectorSearchOptions<HotelInfo>
{
    Filter = h => h.HotelName.Contains("Marriott")
}))
{ /* ... */ }

LINQ-like predicate; translated to provider's filter syntax.

DI

builder.Services.AddAzureAISearchCollection<HotelInfo>(
    "hotels",
    sp => new SearchIndexClient(uri, cred));

public class C(VectorStoreCollection<ulong, HotelInfo> collection) { /* ... */ }

Multiple providers

Same record class; different providers via DI config. Swap by changing one line.

Distance functions

DistanceFunction.CosineSimilarity   // default for text embeddings
DistanceFunction.DotProductSimilarity
DistanceFunction.EuclideanDistance

Indexing

For hybrid search (vector + keyword):

[VectorStoreData(IsFullTextIndexed = true)]   // keyword index
public string Description { get; set; } = "";

Provider varies in features.

Hybrid search

var hybrid = await collection.HybridSearchAsync(
    new HybridSearchRequest<HotelInfo>
    {
        VectorQuery = queryEmb.Vector,
        KeywordQuery = "spa",
        Top = 10
    });

Vector + keyword fused. Best quality. Provider-dependent (Azure AI Search excellent here).

Schema migration

When changing model: - Add new fields → backwards-compatible. - Change embedding dim → re-embed everything. - Change PK type → recreate index.

Plan version.

Senior considerations

Pick provider for your needs: Azure AI Search (hybrid), Cosmos (collocated with app data), Qdrant (open-source perf), pgvector (Postgres-native).
Index design: filter fields, full-text fields balance write cost vs read.
Batch upsert: 100-1000 per request typical.
Track embedding model version with records.