IEmbeddingGenerator
Key Points
IEmbeddingGenerator<TInput, TEmbedding>generates embeddings (vectors) from inputs (typically text).- Vendor-neutral: same interface for OpenAI, Azure OpenAI, Cohere, Ollama.
- Pipeline-able: caching, telemetry, batching middleware.
- Used in RAG: query → embedding → vector search → top-K chunks.
- Batch generation: pass multiple inputs; provider batches efficiently.
Setup
IEmbeddingGenerator<string, Embedding<float>> embed =
new OpenAIClient(apiKey).AsEmbeddingGenerator("text-embedding-3-small");
Basic usage
var emb = await embed.GenerateAsync("Hello world");
ReadOnlyMemory<float> vector = emb.Vector;
// vector.Length == 1536 for text-embedding-3-small
Batch
var inputs = new[] { "doc 1", "doc 2", "doc 3" };
var embeddings = await embed.GenerateAsync(inputs);
// embeddings is GeneratedEmbeddings<Embedding<float>>; iterable
foreach (var e in embeddings) Console.WriteLine(e.Vector.Length);
Provider auto-batches for efficiency.
Pipeline
embed = embed.AsBuilder()
.UseLogging(loggerFactory)
.UseOpenTelemetry()
.UseDistributedCache(cache)
.Build();
Caching especially valuable — same input = same embedding (deterministic).
DI
builder.Services.AddSingleton<IEmbeddingGenerator<string, Embedding<float>>>(sp =>
new OpenAIClient(apiKey).AsEmbeddingGenerator("text-embedding-3-small")
.AsBuilder().UseDistributedCache(sp.GetRequiredService<IDistributedCache>()).Build());
Dimension reduction
text-embedding-3 supports dimensions:
Smaller vector → cheaper storage + faster search; minor quality loss.
Use in RAG
public async Task IndexAsync(Document doc, CancellationToken ct)
{
var chunks = ChunkText(doc.Content, maxTokens: 500);
var embeddings = await _embed.GenerateAsync(chunks.Select(c => c.Text).ToArray(), ct);
var records = chunks.Zip(embeddings, (c, e) => new VectorRecord
{
Id = $"{doc.Id}:{c.Index}",
Text = c.Text,
Embedding = e.Vector
});
await _vectorStore.UpsertBatchAsync(records, ct);
}
public async Task<List<Chunk>> SearchAsync(string query, CancellationToken ct)
{
var qEmb = await _embed.GenerateAsync(query, ct);
return await _vectorStore.SearchAsync(qEmb.Vector, k: 50, ct);
}
Cost
Embedding is cheap. Cache for unchanged docs to save more.
Common providers
// OpenAI
new OpenAIClient(key).AsEmbeddingGenerator("text-embedding-3-small")
// Azure OpenAI
new AzureOpenAIClient(uri, cred).AsEmbeddingGenerator(deploymentName)
// Ollama
new OllamaEmbeddingGenerator(new Uri("http://localhost:11434"), "nomic-embed-text")
// Cohere (community adapter)
Caching
embed = embed.AsBuilder().UseDistributedCache(cache).Build();
// Subsequent identical input → cache hit; no API call
For ingestion of unchanged documents, save embedding cost.
Versioning
Embeddings from different model versions are NOT interchangeable. Track model name with vectors:
public class VectorRecord
{
public string Text { get; set; }
public ReadOnlyMemory<float> Embedding { get; set; }
public string EmbeddingModel { get; set; } = "text-embedding-3-small-v1";
}
When upgrading model, re-embed everything (or query with old model).
Search
Once embedded, store in vector DB. See Microsoft.Extensions.VectorData.
Quality vs cost
| Embedding | Quality | Cost |
|---|---|---|
| text-embedding-3-large | best | 6x small |
| text-embedding-3-small | good | baseline |
| voyage-3 | very good | comparable |
| Cohere embed-v3 | good multilingual | comparable |
| Open (BGE, Nomic) | OK | free (self-host) |
For most RAG: text-embedding-3-small.
Senior considerations
- Cache embeddings for unchanged input.
- Track model version with stored vectors.
- Batch ingestion: provider rate limits — batch + retry.
- Quantization: int8 storage with minimal quality loss; saves DB cost at scale.