Skip to content

OpenAI-compatible Endpoints

Key Points

  • Foundry exposes OpenAI-compatible endpoints for many models. Existing OpenAI SDK code works with one config change.
  • Same applies to: Together, Groq, Fireworks, Mistral, Ollama, vLLM, Anthropic (via some gateways).
  • Massive portability: write code once; route to any provider via config.
  • Limitations: vendor-specific features (extended thinking, computer use, etc.) not exposed via this layer.

What is "OpenAI-compatible"

The OpenAI HTTP API (chat completions, embeddings, images) became a de-facto standard. Many providers implement the same wire format.

POST /v1/chat/completions
{ "model": "gpt-4o", "messages": [...] }

If your client supports OpenAI's API, point it at any compatible endpoint.

.NET pattern

// OpenAI direct
var openAi = new OpenAIClient(
    new ApiKeyCredential(apiKey));

// Azure OpenAI (similar)
var azure = new AzureOpenAIClient(
    new Uri(endpoint),
    new DefaultAzureCredential());

// Foundry OpenAI-compat (uses same OpenAIClient)
var foundry = new OpenAIClient(
    new ApiKeyCredential(apiKey),
    new OpenAIClientOptions { Endpoint = new Uri("https://my-foundry.../v1") });

// Together
var together = new OpenAIClient(
    new ApiKeyCredential(togetherKey),
    new OpenAIClientOptions { Endpoint = new Uri("https://api.together.xyz/v1") });

// Groq
var groq = new OpenAIClient(
    new ApiKeyCredential(groqKey),
    new OpenAIClientOptions { Endpoint = new Uri("https://api.groq.com/openai/v1") });

Same OpenAIClient. Different endpoint + key. Same model API.

As IChatClient

IChatClient chat = together.AsChatClient("meta-llama/Llama-3.3-70B-Instruct-Turbo");

// Identical pipeline
chat = chat.AsBuilder().UseFunctionInvocation().Build();

Use cases

  • Provider switching: try Llama via Together, Mistral via Mistral API, all with same code.
  • Cost optimization: route cheap queries to Together, hard ones to OpenAI.
  • Multi-region: route by user location.
  • Fallback: primary down → switch URL.

DI / config

"AI": {
  "Endpoint": "https://api.together.xyz/v1",
  "ApiKey": "@KeyVault(...)",
  "Model": "llama-3.3-70b"
}
var endpoint = new Uri(config["AI:Endpoint"]!);
var key = new ApiKeyCredential(config["AI:ApiKey"]!);

builder.Services.AddSingleton<IChatClient>(sp =>
    new OpenAIClient(key, new() { Endpoint = endpoint })
        .AsChatClient(config["AI:Model"]!)
        .AsBuilder().UseFunctionInvocation().Build());

Switch provider via config; no code change.

Foundry's OpenAI-compat

Foundry exposes models via OpenAI-compat endpoint:

https://<project>.services.ai.azure.com/openai/v1/

Auth via Bearer token (managed identity / API key).

var foundryChat = new OpenAIClient(
    new ApiKeyCredential(token),
    new OpenAIClientOptions { Endpoint = new Uri("https://<project>.services.ai.azure.com/openai/v1") })
    .AsChatClient("model-deployment-name");

Caveats

Feature support

OpenAI-compat covers basic chat + embeddings + (sometimes) function calling. Vendor-specific features: - Anthropic Computer Use → not in OpenAI-compat. - Anthropic prompt caching → some support. - Gemini multimodal video → not in OpenAI-compat. - OpenAI Assistants API → vendor-specific.

For these, use vendor-specific SDK.

Rate limits

Each provider has different RPM / TPM. Account for in retry strategy.

Token counting

Some providers don't return usage; some do. Test.

Differences in tool calling

Even with OpenAI-compat, tool format may differ subtly. Test thoroughly.

Senior strategy

Use OpenAI-compat as the default abstraction:

config.endpoint → OpenAIClient → IChatClient → Microsoft.Extensions.AI pipeline

Lets you swap providers cheaply.

For features that REQUIRE vendor-specific: build small adapter; isolate.

Anti-patterns

  • ❌ Assuming all features work everywhere via OpenAI-compat.
  • ❌ Pinning to one provider's quirks.
  • ❌ No fallback strategy.

Cross-references