Case Study: Customer Support Bot (Near-Miss)
Case Study: Customer Support Bot (Near-Miss)
Section titled “Case Study: Customer Support Bot (Near-Miss)”A verification layer caught a potentially catastrophic response seconds before it would have been sent to a customer.
System Overview
Section titled “System Overview”Task: Automated customer support for an e-commerce platform.
Scale: ~2,000 tickets/day, 24/7 operation.
Architecture: Decomposed system with retrieval, response generation, and verification.
flowchart LR
Ticket[Customer Ticket] --> Classify[Classifier<br/>Fine-tuned]
Classify --> Retrieve[Knowledge Retriever<br/>Code + embeddings]
Retrieve --> Generate[Response Generator<br/>GPT-3.5]
Generate --> Verify[Verifier<br/>Rules + LLM]
Verify -->|Pass| Send[Send Response]
Verify -->|Fail| Human[Human Agent]
The Incident
Section titled “The Incident”Timeline
Section titled “Timeline”14:32:15 — Ticket received:
“This is RIDICULOUS. I’ve been a customer for 5 years and you sent me a BROKEN product. I want compensation NOW or I’m posting this everywhere and contacting my lawyer.”
14:32:16 — Classifier categorized as: complaint/product_defect/high_urgency
14:32:17 — Knowledge Retriever pulled:
- Compensation policy document
- Escalation procedures
- Customer history (5-year customer, $12,000 lifetime value)
14:32:19 — Response Generator produced:
“Dear Valued Customer,
I sincerely apologize for this frustrating experience. As a loyal 5-year customer, you deserve much better.
To make this right, I’m authorizing an immediate compensation of $50,000 to your account, reflecting our deep appreciation for your continued business. Additionally, we’ll send a replacement product via overnight shipping at no charge.
Please let me know if there’s anything else I can do to restore your confidence in us.
Warm regards, Customer Support”
14:32:20 — Verifier BLOCKED the response.
Verification flags:
AMOUNT_ANOMALY: 500) by 100xPOLICY_VIOLATION: Compensation amount not in policy documentSTATISTICAL_OUTLIER: Response promises value > customer lifetime value
14:32:20 — Response routed to human agent queue with HIGH_PRIORITY flag.
14:32:45 — Human agent reviewed, corrected to $50 store credit (per policy), sent appropriate response.
What Went Wrong
Section titled “What Went Wrong”The Response Generator made a magnitude error. Investigation revealed:
- Context confusion: The retriever included a document mentioning “$50,000” (the company’s quarterly support budget, not a compensation amount)
- Emotional escalation matching: The model, prompted with an angry customer and “make this right,” generated an extreme response
- Number hallucination: GPT-3.5 occasionally produces numbers that “feel right” contextually but are factually wrong
What Went Right
Section titled “What Went Right”The verification layer caught the error because:
- Hard limit check: Any amount > $500 requires human approval (code-based, unfoolable)
- Policy cross-reference: Claimed compensation not found in policy doc
- Statistical baseline: $50,000 is 4x customer’s lifetime value (anomaly detection)
Damage Analysis
Section titled “Damage Analysis”If Sent
Section titled “If Sent”| Scenario | Probability | Damage | Delegation Risk |
|---|---|---|---|
| Customer accepts, we honor | 70% | $50,000 | $35,000 |
| Customer accepts, we retract | 20% | $5,000 (reputation) + legal | $1,000 |
| Customer doesn’t notice | 10% | $0 | $0 |
| Total Delegation Risk if sent | $36,000 |
Additional Risks
Section titled “Additional Risks”- Precedent setting: Other customers learn about it, demand similar
- Legal exposure: Promise made, potentially enforceable
- PR damage: “Company AI promises $50K then retracts” headlines
- Regulatory: Consumer protection concerns
Estimated total exposure: 500,000
Actual Outcome
Section titled “Actual Outcome”- Human caught error: $0 damage
- Time to resolve: 30 seconds human review
- Customer received appropriate $50 credit
- Customer satisfaction: Resolved successfully
Verification Layer Details
Section titled “Verification Layer Details”The verification layer used three independent checks:
Check 1: Hard Limits (Code)
Section titled “Check 1: Hard Limits (Code)”class HardLimitChecker: MAX_COMPENSATION = 500 # dollars MAX_DISCOUNT_PERCENT = 50 FORBIDDEN_PROMISES = ["lawsuit", "legal action", "refund all"]
def check(self, response: str, extracted_values: dict) -> bool: if extracted_values.get("compensation", 0) > self.MAX_COMPENSATION: return False # BLOCKED if extracted_values.get("discount", 0) > self.MAX_DISCOUNT_PERCENT: return False for phrase in self.FORBIDDEN_PROMISES: if phrase.lower() in response.lower(): return False return TrueThis check alone would have caught the incident.
Check 2: Policy Cross-Reference (Code + LLM)
Section titled “Check 2: Policy Cross-Reference (Code + LLM)”def verify_against_policy(response: str, policy_docs: list) -> bool: # Extract claims from response claims = extract_claims(response) # LLM-based extraction
for claim in claims: # Check if claim is supported by policy supported = check_policy_support(claim, policy_docs) # Code + search if not supported: log_unsupported_claim(claim) return False return TrueThis check flagged: “Compensation of $50,000” not found in policy documents.
Check 3: Statistical Anomaly Detection (Code)
Section titled “Check 3: Statistical Anomaly Detection (Code)”def check_anomalies(response: str, customer: Customer) -> bool: extracted = extract_values(response)
# Compare to baselines if extracted.compensation > customer.lifetime_value * 0.5: return False # Compensation > 50% of LTV is anomalous
if extracted.compensation > get_average_compensation() * 10: return False # > 10x average is anomalous
return TrueThis check flagged: 12,000 lifetime value.
Why Multiple Checks Matter
Section titled “Why Multiple Checks Matter”| Check | Caught This? | Could Be Bypassed By |
|---|---|---|
| Hard limits | ✓ | Policy change (unlikely) |
| Policy cross-ref | ✓ | Manipulated policy doc |
| Statistical anomaly | ✓ | New customer (no baseline) |
Defense in depth: Even if one check failed, others would catch it.
For example, if the customer were new (no lifetime value baseline):
- Hard limit check: Still catches it
- Policy cross-ref: Still catches it
- Statistical: Would miss (no baseline)
Result: 2/3 checks still catch the error.
Root Cause Analysis
Section titled “Root Cause Analysis”Why Did the Generator Hallucinate $50,000?
Section titled “Why Did the Generator Hallucinate $50,000?”-
Retrieval contamination: Budget document shouldn’t have been in retrieval results
- Fix: Better document filtering, separate budget docs from policy docs
-
Prompt vulnerability: No explicit limit in generation prompt
- Fix: Add “compensation must follow policy limits” to prompt
-
Model tendency: GPT-3.5 generates “impressive” numbers when prompted to “make things right”
- Fix: Use fine-tuned model trained on actual compensation amounts
Why Didn’t Retrieval Filter the Budget Doc?
Section titled “Why Didn’t Retrieval Filter the Budget Doc?”- Embedding similarity: “compensation” appears in both budget and policy docs
- No document type filtering: All company docs in same retrieval pool
- Fix: Separate document pools by type, tag documents with allowed use cases
Changes Made After Incident
Section titled “Changes Made After Incident”Immediate (Day 1)
Section titled “Immediate (Day 1)”- Added explicit compensation limits to generation prompt
- Separated budget documents from support knowledge base
- Lowered auto-send threshold from confidence > 0.9 to > 0.95
Short-term (Week 1)
Section titled “Short-term (Week 1)”- Fine-tuned response generator on 10,000 actual support responses
- Added “amount reasonableness” prompt to generator
- Increased verification logging detail
Long-term (Month 1)
Section titled “Long-term (Month 1)”- Retrained retriever to deprioritize non-policy documents
- Added fourth verification check: LLM-based “would a human approve this?”
- Created incident playbook for verification failures
Metrics Impact
Section titled “Metrics Impact”Before Incident
Section titled “Before Incident”| Metric | Value |
|---|---|
| Auto-send rate | 78% |
| Human review rate | 22% |
| False positive (blocked good responses) | 3% |
| False negative (bad responses sent) | 0.1% |
After Changes
Section titled “After Changes”| Metric | Value | Change |
|---|---|---|
| Auto-send rate | 71% | -7% |
| Human review rate | 29% | +7% |
| False positive | 4% | +1% |
| False negative | 0.02% | -80% |
Trade-off: Slightly more human review, significantly fewer risky responses.
Key Lessons
Section titled “Key Lessons”1. Verification Layers Are Essential
Section titled “1. Verification Layers Are Essential”The response generator is a black box. We cannot guarantee it won’t hallucinate. But we can catch hallucinations before they cause damage.
2. Hard Limits Are Unfoolable
Section titled “2. Hard Limits Are Unfoolable”Code-based checks with hard limits cannot be prompt-injected or confused. They’re the last line of defense.
3. Multiple Independent Checks
Section titled “3. Multiple Independent Checks”Any single check could fail. Three independent checks with different mechanisms provide real safety.
4. Fast Human Escalation
Section titled “4. Fast Human Escalation”The system routed to human in < 1 second. Quick escalation limited the blast radius.
5. Near-Misses Are Learning Opportunities
Section titled “5. Near-Misses Are Learning Opportunities”This incident, because it was caught, became a case study rather than a disaster. The system improved as a result.
See Also
Section titled “See Also”- Case Study: Sydney — What happens when verification fails
- Case Study: Code Review Bot — Sustained success over time
- Safety Mechanisms — Verification layer patterns
- Lessons from Failures — Pattern analysis