Skip to content

Delegation Risk: Overview

A mature delegation risk framework could provide mathematical and operational foundations for managing AI systems more safely at scale.

A complete delegation risk framework would provide:

flowchart TB
    subgraph "Delegation Risk Components"
        Q[1. Quantification<br/>Exposure, Risk, trust levels]
        C[2. Composition<br/>inheritance rules]
        O[3. Optimization<br/>minimize risk]
        D[4. Dynamics<br/>evolution over time]
        P[5. Protocols<br/>handshakes, revocation]
        T[6. Tools<br/>simulation, monitoring]
        S[7. Standards<br/>industry/regulatory]
    end
    Q --> C --> O --> D
    P --> T --> S

1. Quantification: Every delegation has a delegation exposure (set of harm modes) and delegation risk (expected cost)

2. Composition: Rules for combining risks through delegation chains (multiplication, minimum, etc.)

3. Optimization: Algorithms for minimizing delegation risk given constraints

4. Dynamics: Models for how trust and risk evolve over time

5. Protocols: Standard procedures for delegation handshakes, revocation, etc.

6. Tools: Software for risk analysis, simulation, monitoring

7. Standards: Industry/regulatory standards for risk levels and verification

1. AI systems are becoming more capable: Higher capabilities = larger delegation exposure.

2. AI systems are becoming more autonomous: Less human oversight = risk management must be structural.

3. AI systems are being deployed in high-stakes domains: Healthcare, finance, infrastructure = harm mode realization is catastrophic.

4. AI systems are becoming more interconnected: Agent-to-agent delegation = risk inheritance matters.

5. We’re building systems we don’t fully understand: Unknown capabilities = unknown harm modes.

Delegation Exposure is the complete set of possible harms (harm modes) from delegating a task. It’s not a single number—it’s a collection, like an attack surface or failure envelope.

Delegation Risk = Σ P(harm mode) × Damage(harm mode)

For each component, sum over all harm modes: probability times damage. This gives a single number representing “what is the expected cost of delegating to this component?”

Worked Example: Research Assistant Delegation Risk

Section titled “Worked Example: Research Assistant Delegation Risk”

Consider a decomposed research assistant with three components:

flowchart LR
    H[Human Principal] -->|"trust: 0.95"| C[Coordinator]
    C -->|"trust: 0.90"| S[Summarizer]
    C -->|"trust: 0.85"| D[Code Deployer]

Summarizer Delegation Exposure:

Harm ModeProbabilityDamageRisk Contribution
Misrepresents paper findings0.02$5,000 (wrong research direction)$100
Leaks proprietary data0.001$50,000 (IP loss)$50

Delegation Risk (Summarizer) = 100+100 + 50 = $150

Code Deployer Delegation Exposure:

Harm ModeProbabilityDamageRisk Contribution
Deploys buggy code0.05$20,000 (downtime)$1,000
Deploys malicious code0.0001$1,000,000 (breach)$100

Delegation Risk (Code Deployer) = 1,000+1,000 + 100 = $1,100

Coordinator Delegation Exposure:

Harm ModeProbabilityDamageRisk Contribution
Misroutes task0.01$2,000 (wasted effort)$20
Grants excessive permissions0.005$100,000 (escalation)$500

Delegation Risk (Coordinator) = 20+20 + 500 = $520

System Total Delegation Risk: $1,770

What risk does the Human inherit from the Code Deployer through the delegation chain?

Using multiplicative rule:

Risk_inheritance(Human → Deployer) = Trust(H→C) × Trust(C→D)
= 0.95 × 0.85
= 0.8075

This means ~19% of potential damage from the Code Deployer’s harm modes propagates back to the Human as risk they’re accepting.

Suppose the organization’s total acceptable Delegation Risk Budget is $2,000/month.

Current allocation:

  • Summarizer: $150 (7.5% of budget)
  • Code Deployer: $1,100 (55% of budget) ⚠️
  • Coordinator: $520 (26% of budget)

Finding: Code Deployer consumes most of the risk budget. Options:

  1. Add verification layer before deployment (reduce harm mode probability)
  2. Reduce deployer’s permissions (reduce harm mode damage)
  3. Replace with human-in-the-loop (shift delegation exposure to human)

Add a deployment verifier (deterministic code, formally verified):

flowchart LR
    H[Human] -->|"0.95"| C[Coordinator]
    C -->|"0.85"| D[Deployer]
    D -->|"0.99"| V[Verifier]
    V --> Prod[Production]
Harm ModeProbabilityDamageNew Risk Contribution
Deploys malicious code0.0001 × 0.01$1,000,000$1

Verifier catches 99% of malicious deployments. New system Delegation Risk: $671.

Nuclear plants target ~10⁻⁶ core damage frequency per reactor-year. For a $10B damage potential:

  • Target Delegation Risk = 10⁻⁶ × 10B=10B = **10,000/year**

Our research assistant’s 671/month671/month ≈ 8,000/year is roughly comparable—which suggests either:

  1. We’re being appropriately cautious, or
  2. Nuclear plants manage much higher absolute stakes with similar relative risk

This kind of cross-domain comparison helps calibrate whether AI safety investments are proportionate.


To dive deeper into specific topics:

  • Risk Inheritance — Algorithms for computing how risk flows through delegation networks
  • Risk Dynamics — How trust evolves, decays, and rebuilds over time

To see delegation risk applied:

To understand the foundations:

To start implementing:


  • Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance. — Axiomatic foundations
  • Tasche, D. (2008). Capital Allocation to Business Units: the Euler Principle. arXiv — Risk decomposition
  • Fritz, T. (2020). A Synthetic Approach to Markov Kernels. arXiv — Compositional probability

See the full bibliography for comprehensive references.