Skip to content

Specialty Models

Key Points

Beyond chat: embeddings, rerankers, vision, audio, image generation, video generation. Each has dominant providers and trade-offs.


Embeddings

Convert text → vector. Used for similarity search (RAG).

Model Dim Notes
OpenAI text-embedding-3-small 1536 Cheap; default
OpenAI text-embedding-3-large 3072 Higher quality
Voyage AI voyage-3 1024 Specialty: code, law
Cohere embed-v3 1024 Multi-lingual
Google text-embedding-004 768 Decent
BGE / Nomic / open varies Open-weights

Senior choice: text-embedding-3-small for general, voyage-3-code for code RAG.

IEmbeddingGenerator<string, Embedding<float>> gen =
    new OpenAIClient(key).AsEmbeddingGenerator("text-embedding-3-small");

var embeddings = await gen.GenerateAsync(["text 1", "text 2"]);

Dimension reduction

text-embedding-3 supports dimensions parameter — request smaller embeddings (e.g., 256, 512). Trade quality for storage / search speed.

Rerankers

Refine top-K retrieval results before passing to LLM.

Provider Notes
Cohere Rerank Best quality
Jina Reranker Strong
BGE-Reranker (open) Self-host
Cross-encoder models (sentence-transformers) Self-host
// Cohere reranker
var reranked = await cohere.Rerank(new RerankRequest
{
    Query = userQuery,
    Documents = candidates.Select(c => c.Text).ToArray(),
    TopN = 5
});

Reranking improves RAG citation quality significantly.

Vision

Model Strength
GPT-4o vision Best general
Claude 4 vision OCR/document
Gemini 2.5 Pro Multimodal native
Phi-3 vision Edge

For document parsing: Claude. For real-time: GPT-4o-mini or Gemini Flash.

Audio (speech-to-text)

Model Notes
OpenAI Whisper (whisper-large-v3) Open weights; great quality
OpenAI gpt-4o-audio Real-time
Azure Speech Enterprise; SDKs
AssemblyAI Specialized; speaker diarization
Deepgram Streaming-first

Whisper local via Faster-Whisper / Whisper.NET — runs offline.

TTS (text-to-speech)

Model Notes
OpenAI TTS Multi-voice; quality
ElevenLabs Best voice cloning
Azure Neural TTS Enterprise
Google Wavenet Multi-lang

Image generation

Model Notes
GPT-Image (was DALL-E 3) High quality; integrated
Stable Diffusion (SDXL, SD3) Open weights; flexible
Imagen (Google) High quality
Midjourney API limited
FLUX Latest open quality

Self-hosting: Stable Diffusion or FLUX via Diffusers / ComfyUI.

Video generation

  • Sora (OpenAI; limited).
  • Runway Gen-3 / Veo (Google).
  • Open: Hunyuan, Mochi.

Still emerging. Costly. Not production-stable for most apps.

Code-specialized LLMs

  • Claude 4 Sonnet/Opus for code (current top).
  • GPT-5 / o3-mini for code.
  • Codestral (Mistral; code-specialized).
  • DeepSeek Coder.

Math / reasoning

  • o3 (OpenAI).
  • DeepSeek R1 (open).
  • Claude with extended thinking.

Domain-specialized

  • BioGPT, Med-PaLM (medical).
  • BloombergGPT (finance).
  • Falcon (Arabic).

Use these only if frontier models underperform for your domain. Often they don't.

.NET integration

Microsoft.Extensions.AI provides: - IEmbeddingGenerator<string, Embedding<float>> - IImageGenerator - (chat clients of course)

For audio: vendor SDKs (Azure Speech, OpenAI audio client).

Senior considerations

  • Embedding quality matters: text-embedding-3-small good default; upgrade to large or voyage-3 if RAG quality lacking.
  • Rerankers ALMOST always help RAG quality.
  • Vision tasks: try Claude for OCR, Gemini for video, GPT-4o for general.
  • Audio: streaming providers (Deepgram) for real-time; Whisper for batch.

Cross-references