Skip to content

Risk Formalization

This section formalizes the negative side of the optimization problem: what we’re trying to minimize.

Delegation Risk=P(harm mode)×Damage(harm mode)\text{Delegation Risk} = \sum P(\text{harm mode}) \times \text{Damage}(\text{harm mode})

PageQuestion
Delegation Risk OverviewHow do we quantify delegation risk?
A Walk-ThroughWhat does this look like in practice?
Risk DecompositionWhat types of harm can occur?
Delegation AccountingHow do we track risk like finances?
Exposure CascadeHow does risk flow through chains?
The Insurer’s DilemmaHow do we price and bound exposure?
  • Delegation Risk: Expected cost of delegation = Σ P(harm) × Damage
  • Accident Risk: Non-goal-directed failures (bugs, errors)
  • Defection Risk: Goal-directed failures (scheming, deception)
  • Risk Budget: Maximum acceptable delegation risk
Delegation Risk = Accident Risk + Defection Risk
= Σ P(accident) × Damage + Σ P(defection) × Damage

Start with Delegation Risk Overview for the full formalization.