Skip to content

Trust Updater

This tool helps you convert observed track records into calibrated trust estimates using Bayesian updating.

Simple success rate calculations ignore:

  • Prior knowledge: What you believed before observations
  • Sample size: 9/10 is not the same as 900/1000
  • Uncertainty: Confidence should increase with more data

Bayesian updating properly weighs prior beliefs against observed evidence.


1. Select Prior Belief
Prior: beta(9, 1) = 90.0% reliability, 90% CI: [72%, 99%]
2. Enter Observations
Observed Rate
94.3%
3. Updated Belief
0%25%50%75%100%PriorPosterior
Prior
90.0%
beta(9, 1), 90% CI: [72%, 99%]
Posterior
93.7%
beta(59, 4), 90% CI: [88%, 98%]
Track Record Modifier
0.96×
Performance matches prior expectation
0.5× (much better)1× (as expected)1.5× (much worse)
Usage: Multiply your base risk estimate by 0.96 to get the track-record-adjusted risk.
adjustedRisk = baseRisk × 0.96
View as Squiggle Code
// Trust Calibration Model
// Prior belief about component reliability
prior = beta(9, 1)
// Mean: 90.0%, 90% CI: [72%, 99%]

// Observations
successes = 50
failures = 3
observedRate = 0.943

// Bayesian update
posterior = beta(59, 4)
// Mean: 93.7%, 90% CI: [88%, 98%]

// Track record modifier for risk adjustment
priorMean = mean(prior)      // 0.900
posteriorMean = mean(posterior)  // 0.937
trackRecordModifier = priorMean / posteriorMean  // 0.961

// Apply to risk calculation
baseRisk = ... // your base risk estimate
adjustedRisk = baseRisk * trackRecordModifier

Your prior belief encodes what you expect before seeing any data. The framework provides defaults:

Component TypePriorMean Reliability
Deterministic Codebeta(99, 1)99%
Narrow MLbeta(19, 1)95%
General LLMbeta(9, 1)90%
RL/Agenticbeta(4, 1)80%
Skepticalbeta(5, 5)50%

Enter the number of successes and failures you’ve observed. More observations will move the posterior further from the prior.

The updated belief after combining prior with observations:

posterior = beta(prior_alpha + successes, prior_beta + failures)

The modifier adjusts your risk estimate based on performance vs. expectations:

modifier = prior_mean / posterior_mean
  • < 1.0: Performance better than expected, reduce risk
  • = 1.0: Performance matches expectation, no change
  • > 1.0: Performance worse than expected, increase risk

The 90% CI width indicates your uncertainty:

CI WidthInterpretation
< 5%High confidence, stable estimate
5-15%Moderate confidence
15-30%Low confidence, need more data
> 30%Very uncertain, prior dominates

How much data to significantly update from prior:

Prior StrengthObservations Needed
Weak (alpha + beta ≈ 10)20-50
Moderate (alpha + beta ≈ 50)100-200
Strong (alpha + beta ≈ 200)500+

  1. Start with the appropriate component-type prior
  2. Use a weak prior strength to allow quick updates
  3. Closely monitor first 50-100 tasks
  4. Increase prior strength as track record accumulates
  1. Use a moderate prior based on historical volume
  2. Apply time-weighting (recent observations count more)
  3. Weight complex tasks higher than simple ones
  4. Review calibration quarterly
  1. Use a skeptical prior (trust must be earned)
  2. Weight adversarial failures heavily (3× normal failures)
  3. Focus on worst-case bounds, not just mean
  4. Require statistical significance before increasing trust

The track record modifier feeds directly into the Risk Calculator:

// Base risk from component type
baseRisk = probability * damage
// Adjusted for track record
adjustedRisk = baseRisk * trackRecordModifier

Good track records reduce risk; poor track records increase it.