OpenAI Models

Key Points

GPT-5 family (2026): flagship reasoning + multimodal; most capable.
GPT-4o: balanced flagship; multimodal (text + image + audio); 128K context.
GPT-4o mini: 90% of utility at 1/10 cost. Default for cost-sensitive workloads.
o-series (o1, o3): explicit reasoning; slow but excellent for math/code/logic.
Embeddings: text-embedding-3-small (cheap), text-embedding-3-large (best).
Whisper / TTS: audio.
Direct API or Azure OpenAI (Azure-hosted; SOC2/HIPAA, regional).

Model lineup (2026)

Model	Strength	Pricing (~$/M input)
GPT-5	Top capability	$15-30
GPT-4o	Balanced flagship	$2.50
GPT-4o mini	Cheap workhorse	$0.15
o3	Strongest reasoning	$20+
o3-mini	Cheap reasoning	$1
text-embedding-3-large	Best embedding	$0.13
text-embedding-3-small	Cheap embedding	$0.02

Output tokens ~3-5x input cost. Prompt caching ~10x discount on repeated prefixes.

Strengths

Code: top-tier across benchmarks.
Function calling: most mature; strict JSON Schema; parallel.
Multimodal: text + image + audio (GPT-4o native).
Tool ecosystem: Custom GPTs, Assistants API.

Weaknesses

Cost at frontier tier.
Rate limits — quota pain in early days for new accounts.
Hallucinations still a thing.

.NET integration

// Direct OpenAI
new OpenAIClient(apiKey).AsChatClient("gpt-4o-mini")
    .AsBuilder().UseFunctionInvocation().Build();

// Azure OpenAI (managed identity)
new AzureOpenAIClient(new Uri(endpoint), new DefaultAzureCredential())
    .AsChatClient(deploymentName)
    .AsBuilder().UseFunctionInvocation().Build();

Both expose IChatClient via Microsoft.Extensions.AI.OpenAI.

Reasoning models

new ChatOptions
{
    AdditionalProperties = new() { ["reasoning_effort"] = "high" }   // o3-style
};

Slower; more output tokens; better at hard tasks.

Function calling

OpenAI strict mode: schema enforced, no malformed responses.

chat.GetResponseAsync(messages, new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
});

Auto-generated schema; auto-dispatched via UseFunctionInvocation().

Vision

var msg = new ChatMessage(ChatRole.User, [
    new TextContent("Describe this image:"),
    new DataContent(imageBytes, "image/jpeg")
]);

GPT-4o handles. Charges per "high"/"low" detail.

Whisper (audio transcribe)

new OpenAIClient(apiKey).GetAudioClient("whisper-1");
var transcript = await audioClient.TranscribeAudioAsync(audioStream, fileName);

TTS (text-to-speech)

new OpenAIClient(apiKey).GetAudioClient("tts-1");
var audio = await audioClient.GenerateSpeechFromTextAsync("hello", GeneratedSpeechVoice.Alloy);

Image generation (DALL-E / GPT-Image)

new OpenAIClient(apiKey).GetImageClient("dall-e-3");
var img = await imageClient.GenerateImageAsync("a cat in space");

Azure OpenAI specifics

Provisioned Throughput Units (PTUs): reserved capacity; consistent latency. For high-RPS production.
Standard tier: pay-per-token; rate-limited.
Regional availability: not all models in all regions.
Azure AD auth + Managed Identity: zero-secret access.
Content filters: built-in moderation; tunable.
Dedicated capacity for compliance / SLA.

Senior considerations

When OpenAI direct vs Azure OpenAI?

Need	Pick
Newest models day-1	OpenAI direct
SOC2 / HIPAA / data residency	Azure OpenAI
Cost flexibility	OpenAI direct (pay-per-use)
Reserved capacity	Azure OpenAI PTUs
Existing Azure stack	Azure OpenAI

Quotas

OpenAI direct: account-level rate limits (TPM, RPM). Azure: per-deployment.

Streaming

await foreach (var update in chat.GetStreamingResponseAsync(prompt))
    yield return update.Text ?? "";

Both support natively.

Function calling parallel

new ChatOptions
{
    Tools = [...],
    ToolMode = ChatToolMode.Auto,
    AllowMultipleToolCalls = true   // OpenAI parallel-tool-calls
};

OpenAI supports calling multiple tools per turn.

Cost optimization

gpt-4o-mini default.
Promote to gpt-4o or GPT-5 only when measurably needed.
Prompt caching: structure prompts with stable prefix.
max_tokens cap.