Risk Dynamics
Risk Dynamics
Section titled “Risk Dynamics”Bayesian Trust Updating
Section titled “Bayesian Trust Updating”Start with prior belief about agent trustworthiness, update with observations.
Prior: P(agent_type = trustworthy) = π₀
Observation model: P(outcome | agent_type)
- Trustworthy agent: P(good_outcome) = 0.95
- Untrustworthy agent: P(good_outcome) = 0.3
Posterior after n good outcomes:
π_n = P(trustworthy | n good outcomes) = π₀ × 0.95ⁿ / [π₀ × 0.95ⁿ + (1-π₀) × 0.3ⁿ]Example: π₀ = 0.5 (uninformed prior)
- After 1 good outcome: π₁ = 0.76
- After 5 good outcomes: π₅ = 0.998
- After 10 good outcomes: π₁₀ ≈ 1.0
Trust from posterior:
Delegation Risk = E[damage] × (1 - π_n) × P(bad_outcome | untrustworthy)As π_n → 1, Delegation Risk → 0.
Trust Building Rate
Section titled “Trust Building Rate”How fast can trust be built?
Factors affecting rate:
- Informativeness of observations (high-stakes decisions reveal more)
- Base rate of good behavior (easy tasks don’t discriminate)
- Observation noise (noisy outcomes slow learning)
Optimal trust building: Choose tasks that maximize information gain about trustworthiness.
Information gain from task:
IG(task) = H(type) - E[H(type | outcome(task))]High-information tasks:
- Moderate difficulty (easy tasks always succeed, hard tasks always fail)
- Clear success criteria (unambiguous outcomes)
- Low noise (outcome reflects agent, not environment)
Trust building curriculum: Sequence of tasks with increasing stakes, each building on trust from previous.
Trust Decay Models
Section titled “Trust Decay Models”flowchart LR
subgraph "Trust Over Time"
T0[Trust₀] --> T1[Trust₁]
T1 --> T2[Trust₂]
T2 --> Tn[Trust_n]
end
D[Decay λ] -.-> T1
D -.-> T2
O[Observations] --> T1
O --> T2
Exponential Decay
Section titled “Exponential Decay”Trust(t) = Trust(0) × exp(-λt)Memory-Based Decay
Section titled “Memory-Based Decay”Trust(t) = Σᵢ weight(t - tᵢ) × observation_iWhere weight(Δt) decreases with time since observation.
Event-Triggered Decay
Section titled “Event-Triggered Decay”Trust(t) = Trust(t-1) × (1 - decay_rate × I(trigger_event))Trust drops on specific events (capability change, environment change, time since verification).
Adversarial Trust Dynamics
Section titled “Adversarial Trust Dynamics”Strategic agent: Agent chooses behavior to maximize long-term payoff, not immediate task performance.
Formal model:
Agent payoff = Σₜ δᵗ × [benefit_t(action_t) - penalty_t(action_t) × P(caught)]Where δ is discount factor, benefit includes both legitimate rewards and defection gains.
When does deception pay?
- High discount factor (patient agent)
- Low P(caught) after trust established
- High benefit from late defection
- Low cost of early good behavior
Multi-Agent Trust Learning
Section titled “Multi-Agent Trust Learning”When multiple agents interact, trust becomes strategic:
Reputation systems: Agents’ trust levels are public, affect future opportunities.
Trust signaling: Agents take costly actions to signal trustworthiness.
Trust herding: Agents copy trust decisions of others (social proof).
Trust manipulation: Agents try to manipulate others’ trust perceptions.
Equilibrium analysis: What trust levels emerge in equilibrium?
Model:
- n agents, each chooses trustworthiness level τᵢ ∈ [0,1]
- Cost of trustworthiness: c(τᵢ) (being trustworthy is costly)
- Benefit of being trusted: b(perceived_trust_i) (more trust → more opportunities)
- Perceived trust: function of τᵢ and observations
Nash equilibrium: Each agent’s τᵢ is best response to others’ strategies.
Possible equilibria:
- All trustworthy (if benefits exceed costs)
- All untrustworthy (if costs exceed benefits)
- Mixed (heterogeneous trustworthiness)