Skip to content

Minimal Viable Framework

You don’t need all 45 patterns. Here’s what actually matters.


Delegation Risk = Σ P(harm) × Damage

Every time you delegate to an AI system, you’re accepting expected harm in exchange for capability. Make this tradeoff explicit.


If you implement nothing else, implement these:

Give the minimum permissions needed for the task.

❌ AI has admin access to everything
✓ AI can only read/write files in /workspace

High-stakes decisions go to humans.

❌ AI executes all actions autonomously
✓ Actions over $1000 require human approval

Check outputs before they reach the world.

❌ AI response goes directly to user
✓ Filter checks for PII, harmful content, policy violations

Limit what can go wrong.

❌ AI runs on production server
✓ AI runs in isolated container with no network access

Record everything for when things go wrong.

❌ No logs, can't investigate incidents
✓ All inputs, outputs, and decisions logged with timestamps

Skip if: Your system has < 3 components, or you use completely different models for each component.

Don’t skip if: You have multiple AI verifiers, use the same base model in multiple places, or components share training data.

Skip if: You just want to implement patterns, not understand the theory.

Don’t skip if: You need to justify decisions to stakeholders or want to optimize capability/risk tradeoffs.

Skip if: You’re implementing, not doing research.

Read if: You want to understand the theoretical foundations or contribute improvements.

Skip if: You trust the patterns as-is.

Read if: You want to understand why these methods work (nuclear, finance precedents).


flowchart TD
    Start[What are you building?] --> Q1{Can it access<br/>sensitive data?}
    Q1 -->|No| Q2{Can it take<br/>real-world actions?}
    Q1 -->|Yes| High[High Risk Path]
    Q2 -->|No| Q3{Does it talk to<br/>customers?}
    Q2 -->|Yes| High
    Q3 -->|No| Low[Low Risk Path]
    Q3 -->|Yes| Medium[Medium Risk Path]

    Low --> L1[5 Essential Patterns]
    Medium --> M1[+ Tripwire Monitoring<br/>+ Rate Limiting]
    High --> H1[+ Human Review<br/>+ Formal Verification<br/>+ Defense in Depth]

    style Low fill:#e8f5e9
    style Medium fill:#fff3e0
    style High fill:#ffebee

  • Define your harm modes (what could go wrong?)
  • Set a risk budget (how much harm is acceptable?)
  • Implement least privilege (minimum permissions)
  • Add output filtering
  • Implement human escalation for high-stakes
  • Set up audit logging
  • Create a sandbox environment
  • Add basic anomaly detection
  • Set up alerting for unusual patterns
  • Review logs weekly
  • Track actual vs. expected harm
  • Adjust risk budgets based on data
  • Add patterns as needed

Add more patterns when:

SignalSuggested Addition
Failures are correlatedRead Entanglements, add diversity
Human review bottleneckAdd AI verifiers, graduated autonomy
Novel attacks detectedAdd tripwires, adversarial testing
Cascading failuresAdd bulkheads, blast radius containment
Trust is miscalibratedAdd trust decay, capability sunset

  • Not an alignment solution — Assumes components may be misaligned
  • Not formal verification — Probabilistic, not provable
  • Not a substitute for good models — Better models reduce accident risk
  • Not zero-risk — Manages risk, doesn’t eliminate it

  1. This page (you’re here) — 10 min
  2. Quick Reference — 5 min
  3. Quick Start Guide — 15 min
  4. Implement the 5 essential patterns — 1-5 days

Total: You can have a working risk framework in a week.