For Engineers
For Engineers
Section titled “For Engineers”You want to build safe AI systems. Here’s the fastest path to implementation.
Skip the Theory Path (15 min)
Section titled “Skip the Theory Path (15 min)”If you trust the patterns and just want to implement:
- Quick Reference — Core formulas and decision flowchart (5 min)
- Quick Start Guide — Step-by-step implementation checklist (10 min)
- Minimal Framework — The 5 essential patterns (5 min)
You can stop here and have a working safety layer.
Understand-Then-Build Path (2 hours)
Section titled “Understand-Then-Build Path (2 hours)”If you want to understand why the patterns work:
- Five-Minute Intro — Core problem (5 min)
- Core Concepts — Visual framework (20 min)
- Delegation Risk Overview — Quantitative foundation (30 min)
- Design Patterns Index — Pattern catalog (15 min)
- Composing Patterns — Pattern combinations (20 min)
- Pick your patterns — Based on your threat model (20 min)
Pattern Categories for Engineers
Section titled “Pattern Categories for Engineers”Structural Patterns (How you architect)
Section titled “Structural Patterns (How you architect)”| Pattern | What It Does | When to Use |
|---|---|---|
| Least-X Principles | Minimize permissions, context, autonomy | Always (default) |
| Structural Patterns | Bulkheads, firewalls, blast radius containment | Multi-component systems |
| Decomposed Coordination | Separation of powers, rotating validators | High-stakes decisions |
Verification Patterns (How you check)
Section titled “Verification Patterns (How you check)”| Pattern | What It Does | When to Use |
|---|---|---|
| Verification Patterns | Ghost checker, consistency triangulation | Output validation |
| Multi-Agent Verification | Cross-validation, adversarial collaboration | AI-on-AI verification |
| Monitoring Patterns | Tripwires, anomaly detection | Runtime safety |
Recovery Patterns (When things go wrong)
Section titled “Recovery Patterns (When things go wrong)”| Pattern | What It Does | When to Use |
|---|---|---|
| Recovery Patterns | Checkpoint-rollback, graceful degradation | Any production system |
| Safety Mechanisms | Kill switches, capability sunset | High-autonomy systems |
Implementation Checklist
Section titled “Implementation Checklist”Phase 1: Risk Assessment (Day 1)
Section titled “Phase 1: Risk Assessment (Day 1)”[ ] List all AI components in your system[ ] For each component, identify: - What data can it access? - What actions can it take? - What's the worst-case outcome?[ ] Calculate rough risk budget (see Quick Reference)[ ] Classify as Low / Medium / High riskPhase 2: Essential Safety (Week 1)
Section titled “Phase 2: Essential Safety (Week 1)”[ ] Implement Least Privilege - Restrict file access to workspace only - Remove unnecessary API permissions - Use read-only where possible
[ ] Implement Human Escalation - Define escalation thresholds ($, sensitivity, confidence) - Build approval workflow - Test the escalation path
[ ] Implement Output Filtering - PII detection - Policy violation checks - Harmful content filtering
[ ] Implement Sandboxing - Containerized execution - Network isolation - Resource limits
[ ] Implement Audit Logging - All inputs and outputs - Decision traces - Timestamps and versionsPhase 3: Enhanced Safety (Month 1)
Section titled “Phase 3: Enhanced Safety (Month 1)”[ ] Add verification layer (if high-risk)[ ] Set up monitoring and alerting[ ] Create incident response runbook[ ] Schedule quarterly risk reviewsCode-Adjacent Examples
Section titled “Code-Adjacent Examples”The framework includes worked examples you can reference:
| Example | Type | Key Patterns |
|---|---|---|
| Research Assistant | Multi-step task | Escalation, checkpoints |
| Trading System | High-stakes, autonomous | Risk budgets, circuit breakers |
| Healthcare Bot | Safety-critical | Output filtering, human review |
| Code Deployment | Infrastructure | Sandboxing, rollback |
Integration Points
Section titled “Integration Points”With Existing Frameworks
Section titled “With Existing Frameworks”| If You Use | Integration Point |
|---|---|
| LangChain | Add verification as chain step, use callbacks for monitoring |
| OpenAI Functions | Wrap function calls with permission checks |
| Anthropic Claude | Use system prompts for least-context, implement tool restrictions |
| AutoGPT-style | Add checkpoints between action cycles, implement kill switches |
With CI/CD
Section titled “With CI/CD”Suggestion: Add safety layer tests to your pipeline- Unit tests for output filters- Integration tests for escalation paths- Chaos testing for failure modesDebugging Safety Layers
Section titled “Debugging Safety Layers”Common Issues
Section titled “Common Issues”| Symptom | Likely Cause | Fix |
|---|---|---|
| Too many escalations | Thresholds too tight | Adjust based on false positive rate |
| Missed harmful outputs | Filter gaps | Review missed cases, expand rules |
| Slow responses | Heavy verification | Add caching, async verification |
| Verification false positives | Overfitting to training data | Diversify verifier inputs |
What Engineers Often Skip (But Shouldn’t)
Section titled “What Engineers Often Skip (But Shouldn’t)”-
Entanglements — If you use the same model in multiple places, failures correlate. Your 3 independent 90% verifiers might only give you 92%, not 99.9%.
-
Insurer’s Dilemma — If you set the risk budget but users bear the harm, you’ll systematically underestimate acceptable risk.
-
Capability Probing — Systems can hide capabilities. Periodic probing detects capability drift before it causes harm.
See Also
Section titled “See Also”- Quick Reference — Formulas at a glance
- Examples Catalog — All worked examples
- Composing Patterns — Combining patterns
- Common Mistakes — What to avoid