Skip to content

OpenAI Models

Key Points

  • GPT-5 family (2026): flagship reasoning + multimodal; most capable.
  • GPT-4o: balanced flagship; multimodal (text + image + audio); 128K context.
  • GPT-4o mini: 90% of utility at 1/10 cost. Default for cost-sensitive workloads.
  • o-series (o1, o3): explicit reasoning; slow but excellent for math/code/logic.
  • Embeddings: text-embedding-3-small (cheap), text-embedding-3-large (best).
  • Whisper / TTS: audio.
  • Direct API or Azure OpenAI (Azure-hosted; SOC2/HIPAA, regional).

Model lineup (2026)

Model Strength Pricing (~$/M input)
GPT-5 Top capability $15-30
GPT-4o Balanced flagship $2.50
GPT-4o mini Cheap workhorse $0.15
o3 Strongest reasoning $20+
o3-mini Cheap reasoning $1
text-embedding-3-large Best embedding $0.13
text-embedding-3-small Cheap embedding $0.02

Output tokens ~3-5x input cost. Prompt caching ~10x discount on repeated prefixes.

Strengths

  • Code: top-tier across benchmarks.
  • Function calling: most mature; strict JSON Schema; parallel.
  • Multimodal: text + image + audio (GPT-4o native).
  • Tool ecosystem: Custom GPTs, Assistants API.

Weaknesses

  • Cost at frontier tier.
  • Rate limits — quota pain in early days for new accounts.
  • Hallucinations still a thing.

.NET integration

// Direct OpenAI
new OpenAIClient(apiKey).AsChatClient("gpt-4o-mini")
    .AsBuilder().UseFunctionInvocation().Build();

// Azure OpenAI (managed identity)
new AzureOpenAIClient(new Uri(endpoint), new DefaultAzureCredential())
    .AsChatClient(deploymentName)
    .AsBuilder().UseFunctionInvocation().Build();

Both expose IChatClient via Microsoft.Extensions.AI.OpenAI.

Reasoning models

new ChatOptions
{
    AdditionalProperties = new() { ["reasoning_effort"] = "high" }   // o3-style
};

Slower; more output tokens; better at hard tasks.

Function calling

OpenAI strict mode: schema enforced, no malformed responses.

chat.GetResponseAsync(messages, new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
});

Auto-generated schema; auto-dispatched via UseFunctionInvocation().

Vision

var msg = new ChatMessage(ChatRole.User, [
    new TextContent("Describe this image:"),
    new DataContent(imageBytes, "image/jpeg")
]);

GPT-4o handles. Charges per "high"/"low" detail.

Whisper (audio transcribe)

new OpenAIClient(apiKey).GetAudioClient("whisper-1");
var transcript = await audioClient.TranscribeAudioAsync(audioStream, fileName);

TTS (text-to-speech)

new OpenAIClient(apiKey).GetAudioClient("tts-1");
var audio = await audioClient.GenerateSpeechFromTextAsync("hello", GeneratedSpeechVoice.Alloy);

Image generation (DALL-E / GPT-Image)

new OpenAIClient(apiKey).GetImageClient("dall-e-3");
var img = await imageClient.GenerateImageAsync("a cat in space");

Azure OpenAI specifics

  • Provisioned Throughput Units (PTUs): reserved capacity; consistent latency. For high-RPS production.
  • Standard tier: pay-per-token; rate-limited.
  • Regional availability: not all models in all regions.
  • Azure AD auth + Managed Identity: zero-secret access.
  • Content filters: built-in moderation; tunable.
  • Dedicated capacity for compliance / SLA.

Senior considerations

When OpenAI direct vs Azure OpenAI?

Need Pick
Newest models day-1 OpenAI direct
SOC2 / HIPAA / data residency Azure OpenAI
Cost flexibility OpenAI direct (pay-per-use)
Reserved capacity Azure OpenAI PTUs
Existing Azure stack Azure OpenAI

Quotas

OpenAI direct: account-level rate limits (TPM, RPM). Azure: per-deployment.

Streaming

await foreach (var update in chat.GetStreamingResponseAsync(prompt))
    yield return update.Text ?? "";

Both support natively.

Function calling parallel

new ChatOptions
{
    Tools = [...],
    ToolMode = ChatToolMode.Auto,
    AllowMultipleToolCalls = true   // OpenAI parallel-tool-calls
};

OpenAI supports calling multiple tools per turn.

Cost optimization

  • gpt-4o-mini default.
  • Promote to gpt-4o or GPT-5 only when measurably needed.
  • Prompt caching: structure prompts with stable prefix.
  • max_tokens cap.

Cross-references