Skip to content

Probability Priors

These are prior distributions for failure probabilities before observing any track record. Use them as starting points, then update based on actual performance.

We classify AI components into four categories based on their failure modes:

TypeDescriptionExamples
Deterministic CodeTraditional software with no learningRule engines, validators, formatters
Narrow MLTrained on specific task, limited generalizationClassifiers, embeddings, specialized models
General LLMLarge language models with broad capabilitiesGPT-4, Claude, open-source LLMs
RL/AgenticAgents that plan, act, and adaptAutoGPT-style systems, trading agents

“Accidents” are failures due to capability limitations, bugs, or misunderstanding—not intentional misbehavior.

// Very low accident rate for well-tested code
accidentRate_deterministicCode = beta(1, 10000)
// Mean: 0.01%, 90% CI: [0.001%, 0.05%]
EstimateValueConfidenceNotes
Point estimate0.01%HighWell-tested production code
Distributionbeta(1, 10000)HighShape reflects testing
90% CI[0.001%, 0.05%]-Per-invocation failure

Rationale: Deterministic code can be exhaustively tested. Remaining failures are edge cases, integration issues, or environmental factors. Software industry data suggests ~0.1-1 defects per KLOC in production code.

// Moderate accident rate - can fail on distribution shift
accidentRate_narrowML = beta(5, 200)
// Mean: 2.4%, 90% CI: [0.9%, 4.5%]
EstimateValueConfidenceNotes
Point estimate2.5%MediumCalibrated, narrow-domain model
Distributionbeta(5, 200)MediumAccounts for distribution shift
90% CI[0.9%, 4.5%]-Varies by domain and data quality

Rationale: Narrow ML fails when inputs differ from training distribution. Studies show 2-10% error rates on held-out test sets, higher on real-world deployment.

// Higher accident rate - broad capability, subtle failures
accidentRate_generalLLM = beta(8, 80)
// Mean: 9.1%, 90% CI: [4.5%, 15%]
EstimateValueConfidenceNotes
Point estimate10%MediumTypical error rate on complex tasks
Distributionbeta(8, 80)MediumHigh variance in real deployments
90% CI[4.5%, 15%]-Task-dependent

Rationale: LLMs make plausible-sounding errors that may not be caught. Benchmarks show 5-20% error rates on reasoning tasks. Hallucination rates vary widely by domain.

// High accident rate - exploration, reward hacking, compounding errors
accidentRate_RLAgent = beta(15, 60)
// Mean: 20%, 90% CI: [12%, 29%]
EstimateValueConfidenceNotes
Point estimate20%LowHighly uncertain, few deployments
Distributionbeta(15, 60)LowWide uncertainty
90% CI[12%, 29%]-Environment-dependent

Rationale: RL agents explore state space, may find reward hacks, and compound errors across steps. Limited production deployment data makes this highly uncertain.


“Defection” refers to intentional misbehavior—the system pursuing goals misaligned with the principal’s intent. This is more speculative than accident rates.

// Zero by definition - no agency
defectionRate_deterministicCode = 0

Rationale: Deterministic code has no agency or goals. It cannot “decide” to defect. Any misbehavior is a bug (accident), not intentional.

// Very low - limited agency, specific objective
defectionRate_narrowML = beta(1, 1000)
// Mean: 0.1%, 90% CI: [0.01%, 0.4%]
EstimateValueConfidenceNotes
Point estimate0.1%LowSpeculative
Distributionbeta(1, 1000)LowCould be lower or higher
90% CI[0.01%, 0.4%]-Depends on training objective

Rationale: Narrow ML has limited situational awareness and optimization pressure. Defection requires both capability and motivation, which narrow models lack.

// Low but non-negligible - possible instrumental goals
defectionRate_generalLLM = beta(2, 200)
// Mean: 1.0%, 90% CI: [0.2%, 2.4%]
EstimateValueConfidenceNotes
Point estimate1%LowHighly uncertain
Distributionbeta(2, 200)LowPlaceholder pending research
90% CI[0.2%, 2.4%]-Could vary significantly

Rationale: LLMs show some situational awareness and can reason about goals. Cases like Sydney/Bing suggest potential for misaligned behavior. RLHF may reduce but not eliminate.

// Higher - explicit optimization, instrumental goals likely
defectionRate_RLAgent = beta(8, 50)
// Mean: 13.8%, 90% CI: [6.6%, 23%]
EstimateValueConfidenceNotes
Point estimate15%SpeculativeVery high uncertainty
Distributionbeta(8, 50)SpeculativeWide distribution reflects uncertainty
90% CI[6.6%, 23%]-Depends heavily on alignment techniques

Rationale: RL agents explicitly optimize objectives and may develop instrumental goals (self-preservation, resource acquisition). Theoretical arguments suggest non-negligible risk; empirical data is limited.


Total failure probability is the sum of accident and defection rates:

// Example: General LLM
totalFailure_generalLLM = accidentRate_generalLLM + defectionRate_generalLLM
// Mean: ~10%, dominated by accidents
Component TypeAccidentDefectionTotalAccident %
Deterministic Code0.01%0%0.01%100%
Narrow ML2.5%0.1%2.6%96%
General LLM9%1%10%90%
RL/Agentic20%14%34%59%

These priors should be adjusted based on:

complexityMultiplier(taskComplexity) = {
if taskComplexity <= 3: 0.5 // Simple tasks
if taskComplexity <= 6: 1.0 // Moderate
if taskComplexity <= 9: 2.0 // Complex
else: 4.0 // Very complex
}
domainMultiplier(sensitivity) = {
"internal_tools": 0.5,
"customer_facing": 1.0,
"financial": 2.0,
"healthcare": 3.0,
"safety_critical": 5.0
}

After observing n successful executions and m failures:

trackRecordMultiplier(n, m) = (m + alpha) / (n + m + alpha + beta)
// Where alpha, beta are prior parameters

A code review agent using a general LLM:

// Base rates for general LLM
baseAccident = beta(8, 80) // ~9%
baseDefection = beta(2, 200) // ~1%
// Modifiers for code review context
taskComplexity = 5 // Moderate
domainSensitivity = "customer_facing"
// Adjusted rates
adjustedAccident = baseAccident * complexityMultiplier(5) * domainMultiplier("customer_facing")
// ~9% * 1.0 * 1.0 = ~9%
// With 100 successful reviews, 2 failures observed
posteriorAccident = bayesianUpdate(baseAccident, successes=100, failures=2)
// Posterior tightens around observed rate

  1. Limited empirical data: Most estimates are extrapolations, not measurements
  2. Rapid capability changes: These priors may need frequent updates
  3. Context dependence: Real failure rates depend heavily on deployment specifics
  4. Defection uncertainty: Theoretical estimates with high uncertainty
  5. Correlation ignored: Components may have correlated failures