Prompt & Response Logging
Key Points
- Logging full prompts/responses is sensitive — may contain PII, secrets, customer data.
- Default: do NOT log content; metadata only.
- For debug / audit: opt-in; secured store; redaction; retention policy.
- Sampling: log small % for diagnostics; full data only in dev.
- Compliance: GDPR, HIPAA, etc. — retention + access controls.
What to log by default
Always: - Model name. - Token counts. - Latency. - Status (success / error). - Trace / correlation ID. - Tenant / user (anonymized). - Tool calls (names, not args).
Never (default): - Prompt content. - Response content. - Tool call arguments. - User-supplied content.
Opt-in full logging
chat = chat.AsBuilder()
.UseOpenTelemetry(o => o.EnableSensitiveData = true) // dev / staging
.Build();
Use ONLY in non-prod environments by default. Production: only after compliance review.
Sampling logging
public class SamplingLoggerClient(IChatClient inner, double sampleRate) : DelegatingChatClient(inner)
{
public override async Task<ChatResponse> GetResponseAsync(...)
{
var resp = await base.GetResponseAsync(...);
if (Random.Shared.NextDouble() < sampleRate)
LogFullDetails(messages, resp);
return resp;
}
}
Log e.g., 1% of requests. Diagnose pathological without massive data volume.
Redaction
Strip PII before logging:
public string Redact(string text)
{
text = Regex.Replace(text, @"[\w.+-]+@[\w-]+\.[\w-]+", "[EMAIL]");
text = Regex.Replace(text, @"\b\d{3}-\d{2}-\d{4}\b", "[SSN]");
text = Regex.Replace(text, @"\b(?:\d[ -]*?){13,19}\b", "[CARD]");
return text;
}
For richer redaction: Microsoft Presidio (Python; can call from .NET) or Azure AI Language PII detection.
Storage
Logged prompts go to: - Secure log store (App Insights with limited access). - Encrypted at rest. - Retention policy (30-90 days; compliance-dictated). - Access logged (who reads).
Audit log vs telemetry
Two different needs: - Audit log: legally required record. Long retention; immutable. - Telemetry: debug / monitor. Short retention; queryable.
Don't conflate.
Compliance considerations
GDPR
- User can request deletion → must purge logged prompts.
- Cross-border transfer rules — logs in EU stay in EU.
HIPAA
- PHI in prompts → logs must be HIPAA-compliant store.
- BAA with cloud provider.
Internal policies
- Customer support readable: anonymize.
- Engineering debug: short-term only.
Structured logging
_log.LogInformation("AI call complete: model={Model}, tokens={InTokens}+{OutTokens}, latency={Ms}ms, traceId={TraceId}",
modelName, inTokens, outTokens, sw.ElapsedMilliseconds, Activity.Current?.TraceId);
Avoid:
Eval / regression detection
Log enough metadata to repro / compare: - Model version. - Temperature. - Tools available. - Prompt template version.
Without prompts themselves, you can compare aggregate quality across versions.
Debugging tools
For dev: - Prompt Flow (Microsoft) — debug prompts visually. - LangSmith — hosted observability. - PromptHub — prompt versioning.
Senior considerations
- Default secure: don't log content.
- Compliance review before opt-in.
- Tenant attribution for B2B.
- Cost monitoring — dashboard.
- Redaction at source — don't leak before storage.
Anti-patterns
- ❌ Logging full prompts/responses always.
- ❌ Customer-readable logs without redaction.
- ❌ No retention policy.
- ❌ Logged tokens in plaintext (auth bearer leak).