Trust Updater

This tool helps you convert observed track records into calibrated trust estimates using Bayesian updating.

Why Bayesian Updating?

Simple success rate calculations ignore:

Prior knowledge: What you believed before observations
Sample size: 9/10 is not the same as 900/1000
Uncertainty: Confidence should increase with more data

Bayesian updating properly weighs prior beliefs against observed evidence.

1. Select Prior Belief

Prior: beta(9, 1) = 90.0% reliability, 90% CI: [72%, 99%]

2. Enter Observations

Successes

Failures

Observed Rate

94.3%

3. Updated Belief

Prior

90.0%

beta(9, 1), 90% CI: [72%, 99%]

Posterior

93.7%

beta(59, 4), 90% CI: [88%, 98%]

Track Record Modifier

0.96×

Performance matches prior expectation

0.5× (much better)1× (as expected)1.5× (much worse)

Usage: Multiply your base risk estimate by 0.96 to get the track-record-adjusted risk.
adjustedRisk = baseRisk × 0.96

View as Squiggle Code

// Trust Calibration Model
// Prior belief about component reliability
prior = beta(9, 1)
// Mean: 90.0%, 90% CI: [72%, 99%]

// Observations
successes = 50
failures = 3
observedRate = 0.943

// Bayesian update
posterior = beta(59, 4)
// Mean: 93.7%, 90% CI: [88%, 98%]

// Track record modifier for risk adjustment
priorMean = mean(prior)      // 0.900
posteriorMean = mean(posterior)  // 0.937
trackRecordModifier = priorMean / posteriorMean  // 0.961

// Apply to risk calculation
baseRisk = ... // your base risk estimate
adjustedRisk = baseRisk * trackRecordModifier

How It Works

1. Prior Selection

Your prior belief encodes what you expect before seeing any data. The framework provides defaults:

Component Type	Prior	Mean Reliability
Deterministic Code	beta(99, 1)	99%
Narrow ML	beta(19, 1)	95%
General LLM	beta(9, 1)	90%
RL/Agentic	beta(4, 1)	80%
Skeptical	beta(5, 5)	50%

2. Observations

Enter the number of successes and failures you’ve observed. More observations will move the posterior further from the prior.

3. Posterior Belief

The updated belief after combining prior with observations:

posterior = beta(prior_alpha + successes, prior_beta + failures)

4. Track Record Modifier

The modifier adjusts your risk estimate based on performance vs. expectations:

modifier = prior_mean / posterior_mean

< 1.0: Performance better than expected, reduce risk
= 1.0: Performance matches expectation, no change
> 1.0: Performance worse than expected, increase risk

Interpretation Guide

Confidence Interval Width

The 90% CI width indicates your uncertainty:

CI Width	Interpretation
< 5%	High confidence, stable estimate
5-15%	Moderate confidence
15-30%	Low confidence, need more data
> 30%	Very uncertain, prior dominates

Data Requirements

How much data to significantly update from prior:

Prior Strength	Observations Needed
Weak (alpha + beta ≈ 10)	20-50
Moderate (alpha + beta ≈ 50)	100-200
Strong (alpha + beta ≈ 200)	500+

Best Practices

Starting a New Component

Start with the appropriate component-type prior
Use a weak prior strength to allow quick updates
Closely monitor first 50-100 tasks
Increase prior strength as track record accumulates

Established Components

Use a moderate prior based on historical volume
Apply time-weighting (recent observations count more)
Weight complex tasks higher than simple ones
Review calibration quarterly

Critical Components

Use a skeptical prior (trust must be earned)
Weight adversarial failures heavily (3× normal failures)
Focus on worst-case bounds, not just mean
Require statistical significance before increasing trust

Integration with Risk Calculator

The track record modifier feeds directly into the Risk Calculator:

// Base risk from component type
baseRisk = probability * damage

// Adjusted for track record
adjustedRisk = baseRisk * trackRecordModifier

Good track records reduce risk; poor track records increase it.