Skip to content

Estimates Registry

This section provides a centralized registry of all quantitative estimates used in the Delegation Risk Framework. Each estimate includes:

  • Point estimate: A single representative value
  • Distribution: Uncertainty quantification in Squiggle notation
  • Confidence level: How certain we are about this estimate
  • Source: Where the estimate comes from
  • Last updated: When the estimate was last reviewed

Most risk frameworks use point estimates like “probability = 2%” or “damage = $5,000”. This understates uncertainty and can lead to:

  1. Overconfidence in risk calculations
  2. Poor decisions when parameters are near critical thresholds
  3. Inability to identify which uncertainties matter most
  4. No path to improvement through better calibration

By expressing estimates as probability distributions, we can:

  • Calculate confidence intervals on total risk
  • Perform sensitivity analysis to find critical parameters
  • Update estimates as we gather more data
  • Compare architectures under parameter uncertainty

Default failure probabilities by component type (deterministic code, narrow ML, general LLM, RL agents). These are the priors before observing any track record.

Damage magnitude distributions by category (infrastructure, data, reputation, regulatory, catastrophic). Heavy-tailed distributions for rare-but-severe events.

How much risk reduction do various interventions provide? Verifiers, sandboxing, rate limiting, human review, formal verification.

How to update trust based on track record. Bayesian updating formulas and calibrated priors.

Reference points from nuclear safety, aviation, finance, and other domains with mature risk quantification.

How to gather calibrated probability and damage estimates from domain experts when historical data is unavailable.

How to ground your risk estimates in real-world AI incident data from public databases.

Use these as starting points for your own risk assessment. Adjust based on:

  • Your specific domain (healthcare vs. internal tools)
  • Your organization’s risk appetite
  • Your observed track record
  • Expert judgment from your team

These estimates are hypotheses to be tested. Help improve them by:

  • Analyzing AI incident databases
  • Conducting expert elicitation studies
  • Publishing calibration results
  • Proposing better distribution families

We use Squiggle notation for distributions:

NotationMeaningExample
beta(a, b)Beta distribution with shape parameters a, bbeta(10, 100) → ~9% mean
lognormal(mu, sigma)Lognormal with log-mean μ, log-std σlognormal(log(1000), 0.5)
normal(mean, std)Normal/Gaussian distributionnormal(0.1, 0.02)
uniform(low, high)Uniform between boundsuniform(0.01, 0.1)
mixture([d1, d2], [w1, w2])Weighted mixture of distributionsmixture([normal(0,1), normal(5,1)], [0.8, 0.2])
LevelMeaning
HighBased on substantial data or expert consensus; unlikely to change significantly
MediumReasonable estimate with some uncertainty; may change with new data
LowEducated guess or extrapolation; significant revision possible
SpeculativePlaceholder for discussion; not yet suitable for decisions

We welcome contributions to improve these estimates:

  1. Data-driven updates: If you have incident data or calibration studies
  2. Domain expertise: If you can provide better priors for specific domains
  3. Methodological improvements: Better distribution families, aggregation methods
  4. Error corrections: If you find mistakes or inconsistencies

See our contribution guidelines for how to submit improvements.