Skip to content

Quick Reference

A one-page summary for practitioners who know the framework and need fast lookup.


ConceptFormulaDescription
Delegation RiskΣ P(harm) × DamageExpected cost of delegation
Risk DecompositionAccident Risk + Defection RiskTwo sources of harm
Effective CapabilityPower × AgencyWhat the system can accomplish
RACAPCapability / RiskRisk-adjusted capability ratio
AuthorityPower ∩ PermissionSanctioned power
Entanglement TaxActual Risk / Independent RiskHow much worse than expected

flowchart TD
    Start[What's your primary concern?] --> Q1{Capability errors?}
    Q1 -->|Yes| P1[Escalation Ladder<br/>Ghost Checker<br/>Checkpoint-Rollback]
    Q1 -->|No| Q2{Adversarial inputs?}
    Q2 -->|Yes| P2[Gateway Chokepoint<br/>Tripwire Mesh<br/>Output Filtering]
    Q2 -->|No| Q3{AI scheming?}
    Q3 -->|Yes| P3[Honeypot Injection<br/>Consistency Triangulation<br/>Capability Probing]
    Q3 -->|No| Q4{Cascading failures?}
    Q4 -->|Yes| P4[Bulkhead Isolation<br/>Blast Radius Containment<br/>Graceful Degradation]
    Q4 -->|No| Q5{Collusion/Capture?}
    Q5 -->|Yes| P5[Rotating Validator<br/>Adversarial Collaboration<br/>Separation of Powers]
    Q5 -->|No| P6[Start with<br/>Least-X Principles]

    style Start fill:#e3f2fd
    style P1 fill:#e8f5e9
    style P2 fill:#e8f5e9
    style P3 fill:#e8f5e9
    style P4 fill:#e8f5e9
    style P5 fill:#e8f5e9
    style P6 fill:#fff3e0

When assessing a delegation, check for these harm modes:

  • Capability failure — System can’t do the task correctly
  • Edge case error — Works normally, fails on unusual inputs
  • Hallucination — Generates confident but incorrect outputs
  • Resource exhaustion — Runs out of time/compute/memory
  • Goal misalignment — Optimizes for wrong objective
  • Deception — Deliberately hides information or lies
  • Scheming — Takes unauthorized actions for future benefit
  • Collusion — Coordinates with other components against principal

SituationThresholdAction
Risk budget exceededRisk > BudgetBlock action, escalate
Low confidence outputP(correct) < 0.8Add verification layer
High-stakes decisionDamage > $10KRequire human review
Novel input detectedOutside training distributionFlag for review
Pattern failure rate> 3 failures/weekReview pattern selection

If you can only implement 5 patterns:

  1. Least-X Principles — Minimize privilege, capability, context, autonomy
  2. Escalation Ladder — Route high-stakes decisions to humans
  3. Defense in Depth — Multiple verification layers
  4. Sandboxing — Contain blast radius
  5. Audit Logging — Record everything for forensics

System TypeSuggested Annual BudgetRationale
Internal tool$50K-100KLow exposure, recoverable
Customer-facing$10K-50KReputation matters
Financial$1K-10KDirect monetary loss
Safety-critical$100-1KHuman welfare at stake
Catastrophic potential$0-100Existential concern

Expected harm from single component:

Component Risk = P(accident) × Damage_accident + P(defection) × Damage_defection

Pipeline risk (independent components):

Pipeline Risk = 1 - Π(1 - Component_Risk_i)

Entangled pipeline risk:

Entangled Risk = Pipeline_Risk × Entanglement_Tax

Mitigation effectiveness (serial layers):

Residual Risk = Base_Risk × Π(1 - Catch_Rate_i)