Skip to content

Historical Failures in Quantified Risk Management

Historical Failures in Quantified Risk Management

Section titled “Historical Failures in Quantified Risk Management”

Long-Term Capital Management (1998): Collapse occurred despite sophisticated VaR models because:

  • Models used only 2-3 years of history, missing regime changes
  • Leverage amplified model errors to 25:1 on balance sheet and 100:1+ with derivatives

2008 Financial Crisis exposed VaR’s limitations more broadly:

  • UBS acknowledged “shortcuts” that excluded risks from calculations
  • Models assumed normal distributions when reality had fat tails
  • Correlations assumed stable became 1.0 during crisis
  • Liquidity risk was unmodeled

March 2020 risk parity drawdowns (13-43% across funds) demonstrated vulnerability to simultaneous stock-bond declines violating historical correlation assumptions.

Three Mile Island (1979) revealed inadequate human reliability modeling:

  • Poor human-machine interfaces
  • Operator errors overwhelmed equipment malfunctions

Fukushima (2011) demonstrated that beyond-design-basis external events can create common-cause failures:

  • Defense-in-depth defeated when all barriers fail together
  • “Regulatory capture” undermined safety culture

Challenger (1986) showed:

  • Organizational pressure can override engineering judgment
  • “Normalization of deviance” erodes safety margins

EU ETS Phase 1 failed because:

  • Over-allocation based on poor emissions data
  • Allowance prices collapsed to zero
  • Initial budgets set using industry self-reports that systematically overstated emissions

Quantified risk frameworks fail when:

Failure ModeExamplesAI Safety Implication
Poor data qualityEU ETS Phase 1, LTCMCapability evaluations may be incomplete
Missing regime changes2008 crisisAI capabilities may jump discontinuously
Fat tails underestimatedVaR modelsCatastrophic AI failures may be more likely than models suggest
Leverage amplifies errorsLTCMAI systems may have multiplicative failure modes
Inadequate incentivesEU ETS self-reportingTeams may underreport AI risks
Independence assumptions failFukushima common-causeAI components may fail together
Organizational cultureChallenger, regulatory captureSafety culture may degrade under competitive pressure
Normalization of devianceChallengerSmall safety violations accumulate
  1. Data quality: Risk budgeting is only as good as underlying measurements
  2. Regime changes: Historical behavior may not predict future behavior
  3. Fat tails: Plan for worst cases, not just expected cases
  4. Leverage: Understand how errors multiply through the system
  5. Incentives: Design for truthful reporting, not just compliance
  6. Independence: Test whether components actually fail independently
  7. Culture: Maintain safety culture under commercial pressure
  8. Deviance: Don’t normalize small safety violations

AI safety must:

  • Acknowledge model limitations explicitly
  • Build in margins for unknown unknowns
  • Test assumptions empirically
  • Update frameworks as failures occur
  • Maintain humility about quantification precision
  1. Every framework fails eventually — No risk model is complete
  2. Failure modes are predictable — Data quality, regime changes, fat tails, leverage, incentives
  3. Culture matters as much as math — Organizational pressure can override engineering judgment
  4. Small violations accumulate — Normalization of deviance leads to catastrophe