Skip to content

Nuclear & Aerospace PRA: Deep Dive

Deep research into how nuclear and aerospace industries decompose system-level safety targets to components, and what AI safety can learn from their successes and failures.

OR Gate (any failure causes system failure):

  • P(output) ≈ Σ P(inputs) for rare events
  • Full formula: P(A∪B) = P(A) + P(B) - P(A)·P(B)

AND Gate (all must fail):

  • P(output) = Π P(inputs)
  • This multiplication is key to redundancy benefits

Aerospace 10⁻⁹ per flight hour:

  • ~100 catastrophic failure conditions per aircraft
  • Aggregate budget: ~10⁻⁷ per flight hour total
  • Each condition: ~10⁻⁹ allocation
  • If 2 redundant channels (AND): each can be ~3.2×10⁻⁵ (since 3.2×10⁻⁵ × 3.2×10⁻⁵ ≈ 10⁻⁹)

Nuclear 10⁻⁴ CDF:

  • Target: < 10⁻⁴ Core Damage Frequency per reactor-year
  • Achieved: ~10⁻⁵ (10× better than target)
  • LERF target: < 10⁻⁵, achieved ~10⁻⁶
  • Budget split: Internal events (60-80%), external events (earthquakes, floods, fires)
MeasureFormulaUse Case
Fussell-VeselyF(containing component) / F(total)% contribution to total risk
Risk Achievement Worth (RAW)Q(component failed) / Q(baseline)Risk increase if component fails. RAW > 2 triggers NRC action
Birnbaum∂Q_system / ∂q_componentSensitivity: where improvements have largest impact

SILPFDavg RangeRisk Reduction Factor
SIL 110⁻² to 10⁻¹10 to 100
SIL 210⁻³ to 10⁻²100 to 1,000
SIL 310⁻⁴ to 10⁻³1,000 to 10,000
SIL 410⁻⁵ to 10⁻⁴10,000 to 100,000

For Sensor → Logic → Actuator chain targeting SIL 3 (PFDavg < 10⁻³):

  • Sensor: 3×10⁻⁴
  • Logic: 2×10⁻⁴
  • Actuator: 5×10⁻⁴
  • Total: 10⁻³ (meets SIL 3)

Key insight: High requirements can split across redundant elements

OriginalDecomposition Options
ASIL DD(D) or C(D)+A(D) or B(D)+B(D)
ASIL CC(C) or B(C)+A(C)
ASIL BB(B) or A(B)+A(B)

Critical requirements:

  1. Sufficient independence - no common cause failures
  2. Freedom from interference - cascading failures prevented
  3. Dependent Failure Analysis mandatory
  4. Different hardware, software, teams, locations, power supplies
ASILSPFM (Single Point Fault Metric)LFM (Latent Fault Metric)Max Failure Rate
ASIL D≥99%≥90%≤10 FIT
ASIL C≥97%≥80%≤100 FIT
ASIL B≥90%≥60%≤100 FIT

Severity vs Probability Targets (ARP 4761)

Section titled “Severity vs Probability Targets (ARP 4761)”
SeverityEffectTargetQualitative
CatastrophicMultiple fatalities< 10⁻⁹/FHExtremely Improbable
HazardousSerious injuries< 10⁻⁷/FHExtremely Remote
MajorMinor injuries< 10⁻⁵/FHRemote
MinorDiscomfort< 10⁻³/FHProbable
DALFailure ConditionVerification Objectives
DAL ACatastrophic71 objectives, MC/DC coverage
DAL BHazardous69 objectives
DAL CMajor62 objectives
DAL DMinor26 objectives
DAL ENo EffectMinimal

DAL A requirements:

  • 100% statement coverage
  • 100% decision coverage
  • Modified Condition/Decision Coverage (MC/DC)
  • Independence in verification (separate teams)
  • Full traceability: requirements → design → code → tests

What PRA missed:

  • Human factors completely underestimated
  • Control room design flawed (PORV indicator showed solenoid, not valve position)
  • 11 previous PORV failures ignored
  • Same sequence occurred 18 months earlier at Davis-Besse

Lessons:

  • Full-scope simulators at every plant
  • Symptom-based Emergency Operating Procedures
  • Human factors engineering mandatory
  • Resident inspectors at every plant

What PRA missed:

  • Beyond-design-basis external event (14m tsunami vs 5.7m design basis)
  • Multi-unit common-cause failure not modeled (unit-by-unit PRA)
  • “Independent” power sources all in same flood zone
  • Station blackout scenarios assumed neighboring unit power available

Lessons:

  • Beyond-design-basis events must be analyzed
  • Multi-unit analysis required
  • Physical separation of redundant systems
  • Passive safety systems not requiring AC power

Challenger (1986) - Normalization of Deviance

Section titled “Challenger (1986) - Normalization of Deviance”

Key concept: Gradual process where unacceptable practice becomes acceptable through repetition without catastrophe.

  • O-ring damage observed on multiple previous launches
  • Each instance redefined as “acceptable risk”
  • No rules broken - decision followed NASA protocol
  • Columbia 2003: same pattern repeated (“problems never fixed”)

Lessons for AI safety:

  • Process compliance ≠ safety
  • Financial/schedule pressure corrupts judgment
  • Historical success ≠ future safety
  • Organizational culture > technical systems

Regulatory capture indicators:

  • Self-certification through ODA program
  • Engineer conflicts: employer vs safety responsibility
  • Engineering-driven → finance-driven culture shift

Uncertainty issues:

  • Databases contain legacy component data
  • New technologies lack historical statistics
  • MIL-HDBK-217 “do not produce meaningful or accurate quantitative predictions”
  • Best use: relative comparison between designs, not absolute values

Beta Factor (Common Cause Failure rate):

System QualityBeta Factor
Well-engineered0.001 to 0.05 (0.1% to 5% CCF)
Typical redundant0.05 to 0.25 (5% to 25% CCF)
Poor engineeringup to 0.25 (25% CCF)

Violations of independence:

  • Physical: Common location, shared power, shared cooling
  • Design: Same component type, same manufacturer, common software bugs
  • Operational: Common maintenance errors, same calibration mistakes
DomainTypical FactorNotes
Buildings2.0Well-understood loads
Pressure vessels3.5-4.0
Aerospace structures1.2-1.5Weight critical; compensate with QC, inspection
Pressurized fuselage2.0

Key insight: “High value cannot guarantee no failures” - safety factors don’t cover systematic errors, poor design, or unknown-unknowns. Better described as “ignorance factors.”

  1. Structural: Independent safety committees, cooling-off periods, whistleblower protections
  2. Cultural: Psychological safety, “speak up” culture, consequences for violations
  3. Regulatory: Defense-in-depth, independent verification, adversarial stance
  4. Learning: Historical lessons processes, cross-industry sharing, precursor analysis

flowchart TB
    subgraph "Safety Targets"
        Aero[Aerospace<br/>10⁻⁹/flight hour]
        Nuc[Nuclear CDF<br/>10⁻⁴/reactor-year]
        SIL[SIL 4<br/>10⁻⁵ to 10⁻⁴ PFD]
    end
    subgraph "Decomposition"
        Series["Series: Σ PFD"]
        Para["Parallel: Π PFD"]
    end
    Aero --> Series
    Nuc --> Para
  • Catastrophic aerospace: 10⁻⁹ per flight hour
  • Nuclear CDF: 10⁻⁴ per reactor-year (achieved 10⁻⁵)
  • SIL 4: 10⁻⁵ to 10⁻⁴ PFD
  • Series: PFD_total = Σ PFD_components
  • Parallel (AND): PFD_total = Π PFD_components
  • ASIL D = ASIL B + ASIL B (with independence)
  • Well-designed systems: 5% beta factor typical
  • Independence requires: different hardware, software, teams, locations, power
  • Aerospace: 1.2-1.5× (weight critical)
  • Nuclear: defense-in-depth with multiple barriers
  • Pharmacology: 100-1000× uncertainty factors

  1. Fault tree decomposition - AND/OR gate math works for any system
  2. Importance measures - identify where to focus safety investment
  3. SIL/ASIL decomposition - splitting requirements across redundant components
  4. Hardware metrics - SPFM, LFM concepts applicable to AI monitoring coverage
  1. Failure rate databases - AI lacks historical failure statistics
  2. Random failure assumptions - AI failures are systematic, not random
  3. Independence assumptions - AI components may have correlated failures from training
  4. Physical separation - doesn’t prevent correlated AI behavior
  1. Systematic failure modeling - not random hardware faults
  2. Capability-based targets - not just failure probability
  3. Behavioral independence - different training, architectures, not just physical separation
  4. Emergent capability handling - unknown-unknowns more significant than nuclear/aerospace
  5. Organizational factors - normalization of deviance, regulatory capture equally relevant

  • NRC NUREG documents (NUREG-0492, NUREG/CR-4639, NUREG/CR-5485)
  • IEC 61508, ISO 26262 standards
  • ARP 4761, DO-178C, DO-254 aerospace standards
  • Accident investigation reports (TMI, Fukushima, Challenger, 737 MAX)
  • Academic literature on PRA, importance measures, common cause failures