Risk Calculator
Risk Calculator
Section titled “Risk Calculator”This tool calculates your expected monthly delegation risk using Monte Carlo simulation with probabilistic inputs.
How It Works
Section titled “How It Works”- Define Harm Modes: Each potential failure mode has a probability of occurring and an associated damage distribution
- Probabilistic Modeling: Probabilities use beta distributions; damages use lognormal distributions to capture uncertainty
- Monte Carlo Simulation: 10,000 scenarios are sampled to build the risk distribution
- Budget Analysis: Compare your risk tolerance against the simulated distribution
$/month
Budget Confidence
93.4%
probability of staying within budget
Expected Risk
$222
/month
Median
$166
/month
5th Percentile
$58
/month
95th Percentile
$557
/month
Risk Distribution (10,000 samples)
Green = under budget,Yellow = at budget,Red = over budget
$0$3243
Harm Modes
~$25/mo
5.0%
Higher = more uncertain
~$50/mo
1.0%
Higher = more uncertain
~$80/mo
8.0%
Higher = more uncertain
View as Squiggle Code
// Delegation Risk Model // Generated from calculator inputs // Data processing error prob_0 = beta(5, 95) damage_0 = lognormal(log(500), 0.6) risk_0 = prob_0 * damage_0 // Security vulnerability prob_1 = beta(1, 99) damage_1 = lognormal(log(5000), 1) risk_1 = prob_1 * damage_1 // Incorrect recommendation prob_2 = beta(8, 92) damage_2 = lognormal(log(1000), 0.8) risk_2 = prob_2 * damage_2 // Total monthly delegation risk totalRisk = risk_0 + risk_1 + risk_2 // Budget analysis budget = 500 budgetConfidence = cdf(totalRisk, budget) // Summary statistics mean(totalRisk) // Expected risk quantile(totalRisk, 0.5) // Median quantile(totalRisk, 0.05) // 5th percentile quantile(totalRisk, 0.95) // 95th percentile
Understanding the Inputs
Section titled “Understanding the Inputs”Probability (mean)
Section titled “Probability (mean)”The expected probability that this harm mode occurs each month. For example:
- 0.05 = 5% chance of occurring
- 0.01 = 1% chance of occurring
Probability (uncertainty)
Section titled “Probability (uncertainty)”How confident you are in the probability estimate. Higher values create wider distributions around the mean.
Damage (median $)
Section titled “Damage (median $)”The typical cost when this harm mode occurs. Uses lognormal distribution, so actual damages could be significantly higher.
Damage (log-std)
Section titled “Damage (log-std)”Controls the “tail thickness” of the damage distribution:
- 0.5 = relatively predictable damages
- 1.0 = moderate variation
- 1.5 = high variation (rare events could be much larger)
Understanding the Outputs
Section titled “Understanding the Outputs”Budget Confidence
Section titled “Budget Confidence”The probability that your actual monthly risk stays within budget. Target levels:
- 90%+: Conservative, suitable for risk-averse organizations
- 70-90%: Moderate, typical for most applications
- Below 70%: Aggressive, may need risk reduction measures
Expected Risk vs. Median
Section titled “Expected Risk vs. Median”- Expected (mean): Average of all simulations. Pulled up by tail events.
- Median: Middle value. More representative of “typical” month.
If expected is much greater than median, you have significant tail risk.
5th and 95th Percentiles
Section titled “5th and 95th Percentiles”- 5th percentile: Best-case realistic scenario
- 95th percentile: Worst-case realistic scenario (1-in-20 months)
Recommended Priors
Section titled “Recommended Priors”Use these starting points based on component type:
| Component Type | Probability Prior | Damage Prior |
|---|---|---|
| Deterministic code | beta(1, 100) ~1% | lognormal(log(100), 0.5) |
| Narrow ML | beta(3, 100) ~3% | lognormal(log(500), 0.7) |
| General LLM | beta(8, 80) ~10% | lognormal(log(1000), 0.9) |
| RL/Agentic | beta(15, 60) ~20% | lognormal(log(2000), 1.1) |
See Probability Priors for detailed distributions.
Tips for Better Estimates
Section titled “Tips for Better Estimates”- Start conservative: Underestimating risk is worse than overestimating
- Use reference classes: How did similar systems perform historically?
- Decompose carefully: More specific harm modes are easier to estimate
- Update with data: Use the Trust Updater as you gather track record
- Sensitivity check: Which inputs matter most? See Sensitivity Dashboard