Skip to content

Agent Basics

Key Points

  • An agent = IChatClient + persona (instructions) + tools + memory (thread) + behavior config.
  • ChatClientAgent is the concrete implementation; wraps any IChatClient.
  • Threads persist conversation state across calls.
  • Tool use is automatic via Microsoft.Extensions.AI function-invocation middleware.
  • Agents are composable into orchestrations (next topics).

ChatClientAgent

var agent = new ChatClientAgent(chatClient)
{
    Name = "TechSupport",
    Description = "Helps users troubleshoot technical issues",
    Instructions = """
        You are TechSupport, a calm, methodical assistant.
        Diagnose by asking targeted questions before suggesting fixes.
        Always verify the user's environment before recommending commands.
        """,
    Tools = [
        AIFunctionFactory.Create(SearchKnowledgeBase),
        AIFunctionFactory.Create(LookupTicket)
    ]
};

Instructions becomes the system prompt. Tools become function-call options.

Invoke

var response = await agent.InvokeAsync("Can't print to my office printer");

foreach (var msg in response.Messages)
    Console.WriteLine($"{msg.Role}: {msg.Text}");

Streaming

await foreach (var update in agent.InvokeStreamingAsync("Help me set up VS Code"))
    Console.Write(update.Text);

Threads (memory)

var thread = new AgentThread();

await agent.InvokeAsync("Hi, I'm Alice", thread);
await agent.InvokeAsync("What's my name?", thread);   // "Alice"

AgentThread accumulates messages. Pass on each invoke for memory.

Persistent threads

public interface IThreadStore
{
    Task<AgentThread> LoadAsync(string id);
    Task SaveAsync(string id, AgentThread thread);
}

// Custom impl: serialize messages to DB.

For stateful chatbots, persist threads per user/conversation.

Tools dispatch

When LLM calls a tool, framework auto-invokes the .NET method:

[Description("Search the knowledge base")]
async Task<List<Article>> SearchKnowledgeBase(string query, CancellationToken ct = default)
{
    return await _kb.SearchAsync(query, ct);
}

AIFunctionFactory.Create(SearchKnowledgeBase) — schema generated; tool dispatched on LLM request.

Configuration

new ChatClientAgent(chat)
{
    Instructions = "...",
    Tools = [...],
    DefaultChatOptions = new ChatOptions
    {
        Temperature = 0.2f,
        MaxOutputTokens = 1000
    }
};

Agent metadata

agent.Name              // "TechSupport"
agent.Description       // shown to other agents in orchestrations
agent.Instructions      // system prompt

Custom agent

public class GuardedAgent(Agent inner, IModerator mod) : Agent
{
    public override async Task<AgentResponse> InvokeAsync(IEnumerable<ChatMessage> messages, CancellationToken ct)
    {
        var lastUser = messages.LastOrDefault(m => m.Role == ChatRole.User)?.Text;
        if (lastUser is not null && await mod.IsBlockedAsync(lastUser, ct))
            return new AgentResponse(new ChatMessage(ChatRole.Assistant, "Sorry, I can't help with that."));
        return await inner.InvokeAsync(messages, ct);
    }
}

Wrap; add policy.

Errors

Tool exceptions surfaced to the LLM as tool-call errors. Agent may retry or apologize.

For app-level errors:

try { var r = await agent.InvokeAsync(input); }
catch (HttpRequestException) { /* network */ }
catch (OperationCanceledException) { /* user cancel or timeout */ }

Resilience

Wrap underlying IChatClient with Polly:

chat = chat.AsBuilder()
    .Use(c => c)   // resilience handler typically on HTTP layer
    .Build();

Or use Microsoft.Extensions.Http.Resilience on the HTTP client used by OpenAI SDK.

Cost / token tracking

OTel emits per-invoke. Per-agent track:

private readonly Counter<long> _agentCalls = ...;
agent = new InstrumentedAgent(agent, _agentCalls);

Eval-friendly

Agents are deterministic given same model, temperature, instructions, tools. For tests:

[Fact]
public async Task Agent_responds_to_basic_question()
{
    var resp = await _agent.InvokeAsync("What's 2+2?");
    Assert.Contains("4", resp.Messages.Last().Text);
}

For non-deterministic semantics, use LLM-as-judge or assertion-on-tool-calls.

Architecture (agent + thread + tools + chat client)

+--------------------------------------------------------+
|                   ChatClientAgent                      |
|  Name / Description / Instructions (system prompt)     |
|  +--------------------------------------------------+  |
|  |  AIContextProviders  (memory, RAG, dynamic ctx)  |  |
|  +--------------------------------------------------+  |
|  |  ChatHistoryProvider  <----> AgentThread         |  |
|  |     (in-memory / Cosmos / custom)   per-conv     |  |
|  +--------------------------------------------------+  |
|  |  Tools: AIFunction[]  (AIFunctionFactory.Create) |  |
|  +--------------------------------------------------+  |
|  |  IChatClient pipeline (Use(...) middleware)      |  |
|  |     -> Azure OpenAI / OpenAI / Ollama / Foundry  |  |
|  +--------------------------------------------------+  |
+--------------------------------------------------------+
        |                                  ^
   RunAsync(input, thread)            AgentResponseUpdate
        v                                  |
   +--------+   function-call    +-------------------+
   |  LLM   | <----------------> |  .NET tool method |
   +--------+   tool result      +-------------------+

AgentThread is the abstraction over conversation state — service-managed (e.g. Foundry) or local (InMemoryChatHistoryProvider). The same agent instance can be reused across many threads.

Pros & cons

Pros Cons
Single ChatClientAgent works across Azure OpenAI, OpenAI, Ollama, Foundry, Anthropic Still a thin layer over IChatClient — bring your own resilience/cost controls
Auto function-calling via Microsoft.Extensions.AI middleware Tool dispatch is in-process; no built-in sandboxing
Thread + history providers decouple memory from agent Custom history provider needed for any non-trivial persistence
Pipeline (AIContextProvider, agent middleware, chat-client middleware) is composable Three middleware layers can confuse newcomers about where to inject logic
Streaming via RunStreamingAsync returns rich AgentResponseUpdate Determinism is best-effort; same prompt can yield different tool plans

When to use / when to avoid

  • Use when you need an LLM-driven, possibly multi-turn conversation with tool use and you want vendor portability.
  • Use when persona + tools + memory cleanly captures the unit of work (support bot, research assistant, copilot inside a Blazor/MAUI app).
  • Use when you want to keep the door open to wrapping the agent later as A2A or as an MCP tool (AsAIFunction).
  • Avoid for deterministic, stepwise pipelines — use Workflows (AgentWorkflowBuilder) instead.
  • Avoid when a plain IChatClient.GetResponseAsync call with no tools and no memory suffices — agents add overhead you don't need.
  • Avoid for hard-real-time paths; LLM latency dominates and tool retries amplify it.

Interview Q&A

Q1. What exactly is a ChatClientAgent, and how does it differ from calling IChatClient directly? ChatClientAgent is the concrete AIAgent implementation in Microsoft.Agents.AI that wraps any IChatClient and adds the pipeline pieces an agent needs: instructions (system prompt), tools as AIFunctions, a ChatHistoryProvider for memory, AIContextProviders for RAG/memory injection, and automatic function-calling. A direct IChatClient call is one round-trip with no built-in tool dispatch loop. The agent is the unit you compose into orchestrations and expose via A2A.

Q2. Walk me through the ChatClientAgent request pipeline. On RunAsync, the request flows through agent middleware (decorators added via AsBuilder().Use(...)), then the context layer pulls history from the ChatHistoryProvider and lets each AIContextProvider add messages/tools/instructions, then the request hits the IChatClient (with its own decorator chain), which talks to the LLM. By default ChatClientAgent wraps the chat client with function-calling middleware unless you set UseProvidedChatClientAsIs = true. The response flows back through the same layers, and providers are notified of new messages so they can update memory.

Q3. AgentThread vs passing a List<ChatMessage> yourself — when does it matter? AgentThread is the abstraction. For a ChatClientAgent it can be backed by an InMemoryChatHistoryProvider, a Cosmos provider, or a service-managed thread (e.g. a Foundry-hosted agent owns the thread server-side). For service-hosted agents the thread type must match the agent type — mismatched threads fail fast. Manually shuttling a List<ChatMessage> works for one-offs but skips the provider hooks (memory extraction, summarization, persistence) that AgentThread enables.

Q4. How does tool-calling actually work end-to-end? You register .NET methods via AIFunctionFactory.Create(method) — schema is generated from parameter types and [Description] attributes. The agent passes them as ChatOptions.Tools. When the model emits a tool call, the function-invocation middleware in Microsoft.Extensions.AI parses the call, invokes the .NET method, serializes the result back as a tool message, and re-prompts the model in the same RunAsync — all transparent to the caller. Exceptions become tool-error messages the model can react to.

Q5. The model keeps calling the wrong tool, or calling tools you didn't expect. What do you do? First check tool granularity — too many tools or overlapping descriptions cause confusion; consolidate. Tighten the [Description] strings (they're effectively part of the prompt). Lower temperature for tool-routing scenarios. Add an AIContextProvider or agent middleware that filters the tool list per turn based on intent. As a last resort, switch from automatic function-calling to manual: emit tool calls, dispatch yourself, and inject the result.

Q6. How do you make an agent observable and cost-aware in production? Wire OpenTelemetry on the IChatClient (the Microsoft.Extensions.AI integration emits per-call traces with token counts) and add agent middleware to emit per-agent metrics (Counter<long> for invocations, Histogram<double> for latency). Track token usage on AgentResponse.Usage. For cost caps, enforce a per-thread/per-user budget in agent middleware that throws before the chat client is called. Don't rely on Temperature = 0 for determinism — log inputs and outputs so you can replay.

Q7. Agent vs Workflow — how do you choose? Use an agent when the task is open-ended/conversational and you want the LLM to plan the next step (including which tool to call). Use a workflow (AgentWorkflowBuilder.BuildSequential, graph workflows) when the steps are known, you need explicit ordering, type-safe routing, checkpointing, or human-in-the-loop approval gates. The Microsoft guidance is blunt: if you can write a function to do it, write the function — don't reach for an agent.

Q8. What's the right way to add cross-cutting concerns (auth check, PII redaction, audit) without forking ChatClientAgent? Three injection points, in order of preference. (1) Agent middleware via AsBuilder().Use(...) — wraps the entire run, works for any AIAgent type including A2AAgent. (2) AIContextProvider if the concern is "modify the prompt or tools per turn." (3) IChatClient middleware via chatClient.AsBuilder().Use(...) if it's a transport-layer concern (resilience, caching). Wrapping with a custom Agent subclass works but loses pipeline composition.

Q9. What happens if a tool throws? The function-invocation middleware catches the exception and serializes it as a tool-error message back to the model, which typically apologizes or retries with different arguments. That means transient infra exceptions can silently burn tokens and confuse the model. Wrap tool methods to translate transient errors into a structured "retry later" result, or let real exceptions surface by configuring the function-invocation options to rethrow, then handle in agent middleware.

Q10. How do you persist threads across a stateless web app? Implement a ChatHistoryProvider (or the equivalent storage hook) backed by your store — Cosmos, Redis, SQL — keyed by user/conversation id. The Microsoft.Agents.AI.CosmosNoSql package ships a Cosmos provider (preview). On each request, load the thread, call RunAsync(input, thread), then save. For Foundry-hosted agents, the thread is server-managed — you only persist the thread id.

Q11. How would you architect a customer-support copilot using ChatClientAgent? One ChatClientAgent per persona (triage, billing, technical), each with role-scoped tools (SearchKB, OpenTicket, LookupAccount). A Cosmos-backed ChatHistoryProvider keyed by conversation id for cross-session memory. An AIContextProvider that injects retrieved KB snippets (RAG) and the customer's tier as dynamic instructions. Agent middleware for PII redaction, per-tenant rate-limit, and OTel. Compose them with a sequential or handoff workflow when escalation is needed; expose the whole thing via A2A so the mobile app's agent can call it.

Q12. Is a ChatClientAgent thread-safe? Can I cache one as a singleton? The agent instance itself (Name, Description, Instructions, Tools, the underlying IChatClient) is safe to reuse and is typically registered as a singleton. The AgentThread is per-conversation and is not safe to share across users. Tools that close over mutable state (DbContext, scoped services) need careful scoping — prefer building tools from scoped services per request rather than capturing them in a singleton agent's tool list.

Gotchas / common mistakes

  • Sharing a single AgentThread across users — leaks one user's history into another's prompt.
  • Capturing scoped services (DbContext, HttpClient with auth) in AIFunctionFactory.Create lambdas registered on a singleton agent.
  • Treating Temperature = 0 as deterministic — same prompt can still yield different tool plans across model versions.
  • Forgetting await using on resources owned by the agent (MCP clients, transports), causing leaked child processes.
  • Exposing 30 micro-tools instead of 4 well-described capabilities — the model thrashes between them.
  • Putting secrets or PII into Instructions — they're sent on every turn and logged by most observability stacks.
  • Catching tool exceptions and returning empty results — the model assumes success and confidently lies to the user.
  • Mismatched AgentThread type with a service-hosted agent (e.g. passing a generic thread to AzureAIAgent) — fails fast at runtime.

Further reading

Senior considerations

  • Persona discipline: clear, specific, role-appropriate Instructions matter more than long-prompts.
  • Tool granularity: don't expose internal ops; expose capabilities ("search knowledge base", not 12 micro-functions).
  • Thread management: persist for stateful chats; ephemeral for one-off.
  • Cost cap: per-conversation budget.

Anti-patterns

  • ❌ "You are a helpful assistant" — too generic.
  • ❌ Free-form tool list — too many tools confuse.
  • ❌ Storing PII in unencrypted thread.
  • ❌ Treating agent as deterministic — it isn't, even with low temp.

Cross-references