Trust Updater
Trust Updater
Section titled “Trust Updater”This tool helps you convert observed track records into calibrated trust estimates using Bayesian updating.
Why Bayesian Updating?
Section titled “Why Bayesian Updating?”Simple success rate calculations ignore:
- Prior knowledge: What you believed before observations
- Sample size: 9/10 is not the same as 900/1000
- Uncertainty: Confidence should increase with more data
Bayesian updating properly weighs prior beliefs against observed evidence.
adjustedRisk = baseRisk × 0.96View as Squiggle Code
// Trust Calibration Model // Prior belief about component reliability prior = beta(9, 1) // Mean: 90.0%, 90% CI: [72%, 99%] // Observations successes = 50 failures = 3 observedRate = 0.943 // Bayesian update posterior = beta(59, 4) // Mean: 93.7%, 90% CI: [88%, 98%] // Track record modifier for risk adjustment priorMean = mean(prior) // 0.900 posteriorMean = mean(posterior) // 0.937 trackRecordModifier = priorMean / posteriorMean // 0.961 // Apply to risk calculation baseRisk = ... // your base risk estimate adjustedRisk = baseRisk * trackRecordModifier
How It Works
Section titled “How It Works”1. Prior Selection
Section titled “1. Prior Selection”Your prior belief encodes what you expect before seeing any data. The framework provides defaults:
| Component Type | Prior | Mean Reliability |
|---|---|---|
| Deterministic Code | beta(99, 1) | 99% |
| Narrow ML | beta(19, 1) | 95% |
| General LLM | beta(9, 1) | 90% |
| RL/Agentic | beta(4, 1) | 80% |
| Skeptical | beta(5, 5) | 50% |
2. Observations
Section titled “2. Observations”Enter the number of successes and failures you’ve observed. More observations will move the posterior further from the prior.
3. Posterior Belief
Section titled “3. Posterior Belief”The updated belief after combining prior with observations:
posterior = beta(prior_alpha + successes, prior_beta + failures)4. Track Record Modifier
Section titled “4. Track Record Modifier”The modifier adjusts your risk estimate based on performance vs. expectations:
modifier = prior_mean / posterior_mean- < 1.0: Performance better than expected, reduce risk
- = 1.0: Performance matches expectation, no change
- > 1.0: Performance worse than expected, increase risk
Interpretation Guide
Section titled “Interpretation Guide”Confidence Interval Width
Section titled “Confidence Interval Width”The 90% CI width indicates your uncertainty:
| CI Width | Interpretation |
|---|---|
| < 5% | High confidence, stable estimate |
| 5-15% | Moderate confidence |
| 15-30% | Low confidence, need more data |
| > 30% | Very uncertain, prior dominates |
Data Requirements
Section titled “Data Requirements”How much data to significantly update from prior:
| Prior Strength | Observations Needed |
|---|---|
| Weak (alpha + beta ≈ 10) | 20-50 |
| Moderate (alpha + beta ≈ 50) | 100-200 |
| Strong (alpha + beta ≈ 200) | 500+ |
Best Practices
Section titled “Best Practices”Starting a New Component
Section titled “Starting a New Component”- Start with the appropriate component-type prior
- Use a weak prior strength to allow quick updates
- Closely monitor first 50-100 tasks
- Increase prior strength as track record accumulates
Established Components
Section titled “Established Components”- Use a moderate prior based on historical volume
- Apply time-weighting (recent observations count more)
- Weight complex tasks higher than simple ones
- Review calibration quarterly
Critical Components
Section titled “Critical Components”- Use a skeptical prior (trust must be earned)
- Weight adversarial failures heavily (3× normal failures)
- Focus on worst-case bounds, not just mean
- Require statistical significance before increasing trust
Integration with Risk Calculator
Section titled “Integration with Risk Calculator”The track record modifier feeds directly into the Risk Calculator:
// Base risk from component typebaseRisk = probability * damage
// Adjusted for track recordadjustedRisk = baseRisk * trackRecordModifierGood track records reduce risk; poor track records increase it.