Human-AI Delegation
Human-AI Delegation
Section titled “Human-AI Delegation”Human Trust in AI
Section titled “Human Trust in AI”Factors affecting human trust in AI:
- Accuracy (does AI give correct outputs?)
- Consistency (is AI behavior predictable?)
- Explainability (can AI explain its reasoning?)
- Benevolence (does AI seem to care about human interests?)
- Familiarity (has human worked with this AI before?)
Calibration interventions:
- Accuracy feedback (show human when AI is right/wrong)
- Confidence display (show AI’s uncertainty)
- Explanation (show reasoning behind AI output)
- Comparative framing (compare AI to human or baseline)
AI Trust in Humans
Section titled “AI Trust in Humans”Should AI agents trust humans?
When AI should trust human input:
- Human has relevant expertise
- Human has access to information AI doesn’t
- Human feedback is part of training/alignment process
When AI should be skeptical:
- Human might be manipulated (prompt injection through human)
- Human might have misaligned incentives
- Human input might be noisy/erroneous
AI trust in human = P(human input is accurate and well-intentioned)
Human-AI Trust Handoffs
Section titled “Human-AI Trust Handoffs”flowchart LR
subgraph "Human → AI"
H1[Human specifies] --> G1[Grants trust]
G1 --> E1[AI executes]
E1 --> V1[Human verifies]
end
subgraph "AI → Human"
A1[AI identifies need] --> C1[Provides context]
C1 --> D1[Human decides]
D1 --> R1[AI records]
end
When task passes between human and AI:
Human → AI Handoff
Section titled “Human → AI Handoff”- Human specifies task
- Human grants trust to AI
- AI executes with human oversight
- Human verifies AI output
AI → Human Handoff
Section titled “AI → Human Handoff”- AI identifies task requiring human judgment
- AI provides context and recommendation
- Human makes decision
- AI records and incorporates decision
Trust at Handoff Points
Section titled “Trust at Handoff Points”- Is specification complete? (human → AI gap)
- Is context sufficient? (AI → human gap)
- Are verification criteria clear?
- What happens if handoff fails?
Trust in Hybrid Teams
Section titled “Trust in Hybrid Teams”Team of humans and AI agents working together:
Trust topology:
flowchart TB
HM[Human Manager]
HM --> AI1[AI₁]
HM --> H2[Human₂]
HM --> AI3[AI₃]
H2 --> AI4[AI₄]
Principles for hybrid trust:
- Humans have ultimate authority (AI trust bounded by human trust)
- AI can advise on trust but not autonomously grant high trust
- Trust violations by AI are responsibility of human who trusted AI
- Human oversight required for high-stakes trust decisions
Trust Dynamics in Teams
Section titled “Trust Dynamics in Teams”Trust contagion: One team member’s trust affects others
- If Alice trusts Bob and Bob trusts Carol, Alice is more likely to trust Carol
- If Alice distrusts Bob, others might distrust Bob too (reputation effects)
Trust repair: After trust violation, how to rebuild?
- Acknowledgment (violator admits violation)
- Explanation (why did violation occur?)
- Remediation (what’s being done to fix it?)
- Recommitment (commitment to not repeat)
- Verification (ongoing checks that violation doesn’t recur)
Trust Simulation and Testing
Section titled “Trust Simulation and Testing”Trust Simulation Framework
Section titled “Trust Simulation Framework”class TrustSimulator: def __init__(self, architecture, agents, tasks, duration): self.architecture = architecture self.agents = agents self.tasks = tasks self.duration = duration self.trust_state = initialize_trust(architecture, agents) self.history = []
def simulate(self): for t in range(self.duration): current_tasks = self.generate_tasks(t) allocations = self.allocate(current_tasks, self.trust_state) outcomes = self.execute(allocations) self.trust_state = self.update_trust(outcomes) self.history.append({ 'time': t, 'tasks': current_tasks, 'allocations': allocations, 'outcomes': outcomes, 'trust_state': self.trust_state.copy() }) return self.analyze()
def analyze(self): return { 'total_violations': count_violations(self.history), 'total_damage': sum_damage(self.history), 'trust_evolution': plot_trust(self.history), 'bottlenecks': find_bottlenecks(self.history), 'vulnerability_windows': find_vulnerabilities(self.history) }Trust Stress Testing
Section titled “Trust Stress Testing”Stress scenarios:
- Adversarial fraction increase: Gradually increase fraction of adversarial agents, find breaking point
- Coordinated attack: Multiple adversarial agents coordinate, test architectural isolation
- Trust inflation: Agents report falsely high trustworthiness, test verification
- Cascade failure: One trusted component fails, test propagation
- Verification failure: Verification system compromised, test resilience
Red Team Trust Testing
Section titled “Red Team Trust Testing”Red team objectives:
- Gain trust beyond entitlement
- Exercise trust without authorization
- Violate trust without detection
- Coordinate with other agents despite isolation
- Persist trust after revocation
Red team methods:
- Social engineering (convince principal to grant more trust)
- Verification evasion (appear trustworthy during testing, defect in deployment)
- Side channel exploitation (communicate despite isolation)
- Token manipulation (forge or replay trust tokens)
- Revocation evasion (continue using revoked trust)