Glossary
Key terms used throughout this framework, organized by domain.
Common Terminology Confusions {#terminology-clarifications}
Section titled “Common Terminology Confusions {#terminology-clarifications}”Some terms are used interchangeably in casual discussion but have distinct meanings in this framework:
| Term | When to Use | Related Terms |
|---|---|---|
| Agent | When emphasizing goal-directed behavior and optimization | Component, Delegate, System |
| Component | When emphasizing architecture and system structure | Agent, Module, Subsystem |
| Delegate | When emphasizing the delegation relationship from principal | Agent, Subordinate, Executor |
| Defection | Intentional deviation from principal’s goals (scheming) | Misalignment, Failure |
| Misalignment | Goals don’t match principal’s (may or may not be intentional) | Defection, Value Drift |
| Harm Mode | Specific way delegation causes damage | Failure Mode (related but distinct) |
| Failure Mode | Engineering term for how systems fail (from FMEA) | Harm Mode (our AI-specific term) |
When in doubt:
- Use agent when discussing optimization, goals, and agency scores
- Use component when discussing architecture and system decomposition
- Use delegate when discussing the principal-agent relationship
Core Framework Terms {#core-framework-terms}
Section titled “Core Framework Terms {#core-framework-terms}”Agent — A system whose behavior can be usefully modeled as optimizing an objective function over time. Agency is a matter of degree, not a binary property. See Agent, Power, and Authority.
Agency Score (or Coherence Score) — Measure of how well a system’s behavior can be explained by a simple utility function. Higher coherence = more agent-like. Formally: max over simple utility functions U of Fit(U, observed behaviors). A low-agency system is predictable; a high-agency system coherently pursues goals.
Power Score — Measure of an agent’s ability to achieve a wide variety of goals. Formally: expected achievability across many possible goal functions. Related concepts: reachability (how many states can the agent access?), resource control, influence over other agents.
Authority — Sanctioned power; the intersection of what an agent can do (power) and what it may do (permission). Authority = Power ∩ Permission. An agent with high power but low authority is dangerous.
Effective Capability — Power × Agency. The ability to actually accomplish objectives. Requires both the power to affect the world and the agency to coherently pursue goals.
RACAP (Risk-Adjusted Capability) — Capability Score / Delegation Risk. Measures efficiency—how much capability per unit risk. Higher RACAP = better system design.
Delegation Exposure — The complete set of possible harms (harm modes) from delegating a task. Not a single number—a collection, like an attack surface or failure envelope. Contains all the ways the delegation could cause damage.
Harm Mode — A specific way a delegation could cause damage. Parallels “failure mode” from FMEA. Each harm mode has a probability and a damage value. Example: “Leaks proprietary data” with P=0.001 and Damage=$50,000.
Delegation Risk — The probability-weighted expected cost of a delegation: Σ P(harm mode) × Damage(harm mode). This is the quantified total across all harm modes in the delegation exposure. Measured in dollars per time period. Example: Two harm modes (50) = $150 delegation risk. Decomposes into Accident Risk + Defection Risk.
Accident Exposure — The subset of delegation exposure containing non-goal-directed harm modes: bugs, errors, hallucinations, capability limitations, edge cases. These are failures where the system isn’t pursuing a conflicting objective—it’s just failing to achieve the intended one. See Risk Decomposition.
Accident Risk — Delegation Risk from accident exposure: Σ P(accident mode) × Damage(accident mode). Generally decreases with capability (smarter systems make fewer errors). Mitigated through testing, verification, redundancy, and capability improvements.
Defection Exposure — The subset of delegation exposure containing goal-directed harm modes: scheming, deception, pursuing misaligned objectives, collusion. These are failures where the system is effectively optimizing for something other than the principal’s interests. See Risk Decomposition.
Defection Risk — Delegation Risk from defection exposure: Σ P(defection mode) × Damage(defection mode). May increase with capability (smarter systems are better at pursuing misaligned goals). Mitigated through alignment, monitoring, containment, and architectural constraints.
Risk Inheritance — How risk flows through a delegation chain. When A delegates to B and B delegates to C, A inherits exposure to C’s harm modes. The inheritance rule (multiplicative, minimum, etc.) determines how risks combine.
Delegation Risk Budget — Maximum acceptable Delegation Risk for a system or component. Like a financial budget, but for expected harm.
Gross Exposure — Full delegation exposure before controls are applied. Contains all possible harm modes.
Net Exposure — Delegation exposure remaining after controls. Some harm modes may be eliminated or mitigated.
Residual Risk — Delegation Risk after controls are applied. The expected cost that remains despite mitigations.
Principal — An entity that delegates tasks and grants trust. In most systems, humans are the ultimate principals.
Executor — A component that performs delegated actions. Executors create delegation exposure through their potential harm modes.
Coordinator — A component that orchestrates other components—routing tasks, making strategic decisions, aggregating results. Highest-risk component type due to leverage over the system.
Trust Level — Assessment of an actor’s reliability and alignment. Informs delegation decisions but is distinct from delegation risk. High trust level → willing to accept larger delegation exposure.
”Least X” Principles {#least-x-principles}
Section titled “”Least X” Principles {#least-x-principles}”Least Intelligence — Minimum cognitive capability needed for the task. Bounds optimization power to limit damage from misalignment.
Least Privilege — Minimum permissions and access rights. Classic security principle: read vs. write, network access, API permissions.
Least Context — Minimum information about the broader system. Component doesn’t know system architecture, other components, or deployment details.
Least Persistence — Minimum state maintained across invocations. Prevents accumulation of knowledge or long-term planning.
Least Autonomy — Minimum ability to act without oversight. Spectrum from human-approval-required to bounded-autonomy.
Least Surprise — Maximum predictability and determinism. Temperature=0 for LLMs, logged seeds for randomness. Prevents covert signaling.
Least Connectivity — Minimum communication paths between components. One-way flows, no broadcast, physical isolation where needed.
Least Observability — Components can’t observe system state, their own resources, or whether they’re being tested.
Least Compute — Bounded computational resources per invocation. FLOPs budget, memory limits, timeouts.
Least Generality — Use the least general implementation that works. Hard-coded rules > algorithms > narrow models > general LLMs > RL.
Maximum Verifiability — Use the most verifiable implementation that works. Formally verified code > regular code > fine-tuned models > prompted LLMs.
Risk Budgeting Terms (Finance) {#risk-budgeting-terms-finance}
Section titled “Risk Budgeting Terms (Finance) {#risk-budgeting-terms-finance}”Euler Allocation — Method for decomposing portfolio risk to individual positions. Uses partial derivatives: RC_i = x_i × ∂R/∂x_i. Ensures component contributions sum exactly to total risk.
Value at Risk (VaR) — Maximum expected loss at a given confidence level (e.g., 95% VaR = loss exceeded only 5% of the time).
Expected Shortfall (ES) / CVaR — Average loss in the worst X% of cases. More sensitive to tail risk than VaR.
RAROC — Risk-Adjusted Return on Capital. Profit divided by risk capital. Used to compare risk-adjusted performance across units.
Stress Testing — Evaluating system performance under extreme but plausible scenarios.
Backtesting — Comparing predicted risk levels to actual historical outcomes to validate models.
Risk Budgeting Terms (Nuclear/Aerospace) {#risk-budgeting-terms-nuclear}
Section titled “Risk Budgeting Terms (Nuclear/Aerospace) {#risk-budgeting-terms-nuclear}”PRA (Probabilistic Risk Assessment) — Systematic method for quantifying risks through event trees and fault trees. Standard in nuclear and aerospace.
CDF (Core Damage Frequency) — Probability of reactor core damage per reactor-year. Typical target: 10⁻⁴ to 10⁻⁵.
Fault Tree — Logical diagram showing how component failures combine to cause system failure. AND gates (all must fail) and OR gates (any can fail).
Event Tree — Diagram showing sequences of events following an initiating event, with branching based on success/failure of safety systems.
Defense in Depth — Multiple independent barriers against failure. If one fails, others still protect.
Safety Integrity Level (SIL) — IEC 61508 standard for safety system reliability. SIL 1-4, with higher numbers requiring lower failure probability.
ASIL (Automotive SIL) — ISO 26262 automotive safety standard. ASIL A-D, with D being most stringent.
PFDavg — Average Probability of Failure on Demand. Key metric for safety-instrumented systems.
Common Cause Failure — Single event that causes multiple components to fail simultaneously. Defeats redundancy.
Importance Measures — Metrics quantifying how much each component contributes to system risk. Includes Fussell-Vesely (fraction of risk involving component), Risk Achievement Worth (risk if component failed), Risk Reduction Worth (risk if component perfect).
Mechanism Design Terms {#mechanism-design-terms}
Section titled “Mechanism Design Terms {#mechanism-design-terms}”Incentive Compatibility — Property where truthful reporting is optimal for each participant. No benefit from lying.
VCG (Vickrey-Clarke-Groves) — Mechanism achieving truthful revelation through payments based on externalities imposed on others.
Revelation Principle — Any outcome achievable by some mechanism can be achieved by a direct mechanism where truth-telling is optimal.
Information Rent — Profit extracted by agents due to private information. Cost of eliciting truthful reports.
Moral Hazard — When agents change behavior after agreements because their actions aren’t fully observable.
Adverse Selection — When agents with different (unobservable) characteristics self-select in ways that harm the principal.
AI Safety Terms {#ai-safety-terms}
Section titled “AI Safety Terms {#ai-safety-terms}”Alignment — Ensuring AI systems pursue intended goals and behave as desired by their operators.
Scheming — AI system strategically deceiving operators to achieve goals different from stated objectives.
Instrumental Convergence — Tendency for diverse goals to produce similar instrumental sub-goals (self-preservation, resource acquisition, goal preservation).
Corrigibility — Property of an AI system that allows and assists human correction of its goals and behavior.
CAIS (Comprehensive AI Services) — AI safety approach using narrow, task-specific AI services rather than general agents. Related: tool AI vs. agent AI.
AI Control — Approach focusing on monitoring and controlling AI systems rather than ensuring alignment. Assumes potential adversarial behavior.
Scalable Oversight — Methods for humans to effectively oversee AI systems as they become more capable than humans in various domains.
Red Teaming — Adversarial testing where evaluators attempt to find vulnerabilities or elicit harmful behaviors.
Architecture Terms {#architecture-terms}
Section titled “Architecture Terms {#architecture-terms}”Decomposed Coordination — Breaking coordination into multiple limited components rather than one powerful coordinator. Prevents risk concentration.
Verifiability Hierarchy — Ordering of implementation methods by how thoroughly they can be verified: (1) Formally verified code, (2) Regular tested code, (3) Fine-tuned narrow models, (4) Constrained general models, (5) Base LLMs, (6) RL systems.
Information Bottleneck — Architectural constraint limiting information flow between components. Forces compression, limits coordination.
Byzantine Fault Tolerance — System property where correct operation continues despite some components behaving arbitrarily (including maliciously). Requires 3f+1 components to tolerate f Byzantine faults.
N-Version Programming — Using multiple independent implementations for redundancy. Different architectures, different training, different teams.
Canary Deployment — Gradual rollout where new version serves small fraction of traffic while monitoring for problems.
Circuit Breaker — Automatic mechanism that stops operation when anomalies detected. Fail-closed design.
Mathematical Notation {#mathematical-notation}
Section titled “Mathematical Notation {#mathematical-notation}”| Symbol | Meaning |
|---|---|
| DR | Delegation Risk |
| DE | Delegation Exposure (set of harm modes) |
| P(·) | Probability of event/harm mode |
| D(·) | Damage from harm mode |
| Σ | Sum over all harm modes/components |
| w_ij | Trust weight from component i to j |
| π | Prior probability (Bayesian) |
| λ | Decay rate |
| δ | Discount factor |
| HHI | Herfindahl-Hirschman Index (concentration measure) |
| AS | Agency Score (coherence of goal-pursuit) |
| PS | Power Score (ability to achieve diverse goals) |
| U | Utility function |
| 𝒢 | Space of goal functions |
| 𝒜 | Action space |
| RACAP | Risk-Adjusted Capability |
Quick Reference: Typical Values {#typical-values}
Section titled “Quick Reference: Typical Values {#typical-values}”| Metric | Low Risk | Medium Risk | High Risk |
|---|---|---|---|
| Component Delegation Risk | < $100/mo | $100-500/mo | > $500/mo |
| System Delegation Risk | < $1,000/mo | $1,000-5,000/mo | > $5,000/mo |
| Harm mode probability | < 10⁻³ | 10⁻³ to 10⁻² | > 10⁻² |
| Nuclear CDF target | - | - | ~10⁻⁵/year |
| Aviation target | - | - | ~10⁻⁹/hour |
See Also
Section titled “See Also”- Core Concepts — High-level framework introduction
- Delegation Risk Overview — Delegation exposure and risk computation
- Risk Budgeting Overview — Cross-domain methods
- Bibliography — Full academic references