Prompt & Response Logging

Key Points

Logging full prompts/responses is sensitive — may contain PII, secrets, customer data.
Default: do NOT log content; metadata only.
For debug / audit: opt-in; secured store; redaction; retention policy.
Sampling: log small % for diagnostics; full data only in dev.
Compliance: GDPR, HIPAA, etc. — retention + access controls.

What to log by default

Always: - Model name. - Token counts. - Latency. - Status (success / error). - Trace / correlation ID. - Tenant / user (anonymized). - Tool calls (names, not args).

Never (default): - Prompt content. - Response content. - Tool call arguments. - User-supplied content.

Opt-in full logging

chat = chat.AsBuilder()
    .UseOpenTelemetry(o => o.EnableSensitiveData = true)   // dev / staging
    .Build();

Use ONLY in non-prod environments by default. Production: only after compliance review.

Sampling logging

public class SamplingLoggerClient(IChatClient inner, double sampleRate) : DelegatingChatClient(inner)
{
    public override async Task<ChatResponse> GetResponseAsync(...)
    {
        var resp = await base.GetResponseAsync(...);
        if (Random.Shared.NextDouble() < sampleRate)
            LogFullDetails(messages, resp);
        return resp;
    }
}

Log e.g., 1% of requests. Diagnose pathological without massive data volume.

Redaction

Strip PII before logging:

public string Redact(string text)
{
    text = Regex.Replace(text, @"[\w.+-]+@[\w-]+\.[\w-]+", "[EMAIL]");
    text = Regex.Replace(text, @"\b\d{3}-\d{2}-\d{4}\b", "[SSN]");
    text = Regex.Replace(text, @"\b(?:\d[ -]*?){13,19}\b", "[CARD]");
    return text;
}

For richer redaction: Microsoft Presidio (Python; can call from .NET) or Azure AI Language PII detection.

Storage

Logged prompts go to: - Secure log store (App Insights with limited access). - Encrypted at rest. - Retention policy (30-90 days; compliance-dictated). - Access logged (who reads).

Audit log vs telemetry

Two different needs: - Audit log: legally required record. Long retention; immutable. - Telemetry: debug / monitor. Short retention; queryable.

Don't conflate.

Compliance considerations

User can request deletion → must purge logged prompts.
Cross-border transfer rules — logs in EU stay in EU.

HIPAA

PHI in prompts → logs must be HIPAA-compliant store.
BAA with cloud provider.

Internal policies

Customer support readable: anonymize.
Engineering debug: short-term only.

Structured logging

_log.LogInformation("AI call complete: model={Model}, tokens={InTokens}+{OutTokens}, latency={Ms}ms, traceId={TraceId}",
    modelName, inTokens, outTokens, sw.ElapsedMilliseconds, Activity.Current?.TraceId);

Avoid:

_log.LogInformation("AI call: prompt={Prompt}", userPrompt);   // never in prod default

Eval / regression detection

Log enough metadata to repro / compare: - Model version. - Temperature. - Tools available. - Prompt template version.

Without prompts themselves, you can compare aggregate quality across versions.

Debugging tools

For dev: - Prompt Flow (Microsoft) — debug prompts visually. - LangSmith — hosted observability. - PromptHub — prompt versioning.

Senior considerations

Default secure: don't log content.
Compliance review before opt-in.
Tenant attribution for B2B.
Cost monitoring — dashboard.
Redaction at source — don't leak before storage.

Anti-patterns

❌ Logging full prompts/responses always.
❌ Customer-readable logs without redaction.
❌ No retention policy.
❌ Logged tokens in plaintext (auth bearer leak).