Quick Reference
Quick Reference
Section titled “Quick Reference”A one-page summary for practitioners who know the framework and need fast lookup.
Core Formulas
Section titled “Core Formulas”| Concept | Formula | Description |
|---|---|---|
| Delegation Risk | Σ P(harm) × Damage | Expected cost of delegation |
| Risk Decomposition | Accident Risk + Defection Risk | Two sources of harm |
| Effective Capability | Power × Agency | What the system can accomplish |
| RACAP | Capability / Risk | Risk-adjusted capability ratio |
| Authority | Power ∩ Permission | Sanctioned power |
| Entanglement Tax | Actual Risk / Independent Risk | How much worse than expected |
Pattern Selection Flowchart
Section titled “Pattern Selection Flowchart”flowchart TD
Start[What's your primary concern?] --> Q1{Capability errors?}
Q1 -->|Yes| P1[Escalation Ladder<br/>Ghost Checker<br/>Checkpoint-Rollback]
Q1 -->|No| Q2{Adversarial inputs?}
Q2 -->|Yes| P2[Gateway Chokepoint<br/>Tripwire Mesh<br/>Output Filtering]
Q2 -->|No| Q3{AI scheming?}
Q3 -->|Yes| P3[Honeypot Injection<br/>Consistency Triangulation<br/>Capability Probing]
Q3 -->|No| Q4{Cascading failures?}
Q4 -->|Yes| P4[Bulkhead Isolation<br/>Blast Radius Containment<br/>Graceful Degradation]
Q4 -->|No| Q5{Collusion/Capture?}
Q5 -->|Yes| P5[Rotating Validator<br/>Adversarial Collaboration<br/>Separation of Powers]
Q5 -->|No| P6[Start with<br/>Least-X Principles]
style Start fill:#e3f2fd
style P1 fill:#e8f5e9
style P2 fill:#e8f5e9
style P3 fill:#e8f5e9
style P4 fill:#e8f5e9
style P5 fill:#e8f5e9
style P6 fill:#fff3e0
Harm Mode Checklist
Section titled “Harm Mode Checklist”When assessing a delegation, check for these harm modes:
Accident Modes
Section titled “Accident Modes”- Capability failure — System can’t do the task correctly
- Edge case error — Works normally, fails on unusual inputs
- Hallucination — Generates confident but incorrect outputs
- Resource exhaustion — Runs out of time/compute/memory
Defection Modes
Section titled “Defection Modes”- Goal misalignment — Optimizes for wrong objective
- Deception — Deliberately hides information or lies
- Scheming — Takes unauthorized actions for future benefit
- Collusion — Coordinates with other components against principal
Key Thresholds
Section titled “Key Thresholds”| Situation | Threshold | Action |
|---|---|---|
| Risk budget exceeded | Risk > Budget | Block action, escalate |
| Low confidence output | P(correct) < 0.8 | Add verification layer |
| High-stakes decision | Damage > $10K | Require human review |
| Novel input detected | Outside training distribution | Flag for review |
| Pattern failure rate | > 3 failures/week | Review pattern selection |
Essential Patterns (Top 5)
Section titled “Essential Patterns (Top 5)”If you can only implement 5 patterns:
- Least-X Principles — Minimize privilege, capability, context, autonomy
- Escalation Ladder — Route high-stakes decisions to humans
- Defense in Depth — Multiple verification layers
- Sandboxing — Contain blast radius
- Audit Logging — Record everything for forensics
Risk Budget Rules of Thumb
Section titled “Risk Budget Rules of Thumb”| System Type | Suggested Annual Budget | Rationale |
|---|---|---|
| Internal tool | $50K-100K | Low exposure, recoverable |
| Customer-facing | $10K-50K | Reputation matters |
| Financial | $1K-10K | Direct monetary loss |
| Safety-critical | $100-1K | Human welfare at stake |
| Catastrophic potential | $0-100 | Existential concern |
Quick Formulas
Section titled “Quick Formulas”Expected harm from single component:
Component Risk = P(accident) × Damage_accident + P(defection) × Damage_defectionPipeline risk (independent components):
Pipeline Risk = 1 - Π(1 - Component_Risk_i)Entangled pipeline risk:
Entangled Risk = Pipeline_Risk × Entanglement_TaxMitigation effectiveness (serial layers):
Residual Risk = Base_Risk × Π(1 - Catch_Rate_i)See Also
Section titled “See Also”- Core Concepts — Full conceptual explanation
- Minimal Framework — 80/20 version
- Design Patterns Index — All 45 patterns
- Glossary — Term definitions