Formalizing Agents, Power, and Authority
Formalizing Agents, Power, and Authority
Section titled “Formalizing Agents, Power, and Authority”The Fundamental Optimization Problem
Section titled “The Fundamental Optimization Problem”The framework so far focuses heavily on Delegation Risk—what could go wrong. But the goal isn’t to minimize risk to zero (that would mean no AI benefit). The real goal is:
Or equivalently, maximize risk-adjusted capability:
To make this concrete, we need to formalize Capability, which itself decomposes into Power and Agency.
Part I: Formalizing Agency
Section titled “Part I: Formalizing Agency”What Is an Agent?
Section titled “What Is an Agent?”An agent is a system whose behavior can be usefully modeled as optimizing an objective function over time. “Agency” is a matter of degree—not a binary property.
The key insight: agency is a modeling choice, not an ontological category. We call something an agent when treating it as an optimizer is predictively useful.
Agency Score (Coherence)
Section titled “Agency Score (Coherence)”Agency Score (or Coherence Score) measures how well a system’s behavior can be explained by a simple utility function. Higher coherence = more agent-like.
Definition: Agency Score
Section titled “Definition: Agency Score”For a system S observed over behavior traces B = {b₁, b₂, …}:
Where:
- is the space of “simple” utility functions (bounded complexity)
- measures how well utility function U explains observed behaviors B
Interpretation: If there exists a simple utility function that predicts most of S’s behavior, S has high agency. If S’s behavior requires a complex, ad-hoc model to explain, it has low agency.
Operationalizing Fit
Section titled “Operationalizing Fit”Several approaches to measuring Fit(U, B):
1. Prediction accuracy:
How often does “assume S maximizes U” correctly predict S’s actions?
2. Description length (MDL-inspired):
How much compression do we get by describing S as “maximizes U”?
3. Consistency over time:
Does the best-fit U remain stable over time, or does it drift?
The Complexity-Fit Tradeoff
Section titled “The Complexity-Fit Tradeoff”xychart-beta
title "Complexity-Fit Tradeoff"
x-axis "Complexity(U)" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y-axis "Fit" 0 --> 100
line [10, 35, 55, 70, 80, 87, 92, 95, 97, 98, 99]
Simple U fits poorly (low agency). Complex U overfits—behavior is just complex, not purposeful (still low agency).
True agency = high fit with low complexity utility function.
Agency Decomposition
Section titled “Agency Decomposition”Agency can be decomposed into components:
| Component | Definition | Measures |
|---|---|---|
| Goal-Directedness | Degree to which actions systematically reduce distance to goal states | Does S navigate toward targets despite perturbations? |
| Planning Horizon | How far ahead S optimizes | Does S sacrifice short-term for long-term gains? |
| Modeling Depth | How sophisticated S’s world model is | Does S anticipate second-order effects? |
| Consistency | Stability of apparent objectives over time | Does S maintain goals despite distractions? |
Measuring Goal-Directedness
Section titled “Measuring Goal-Directedness”One formalization (inspired by Levin & Dennett):
How much more likely is S to reach goal G compared to a random process? High ratio = S is “trying” to reach G.
Measuring Planning Horizon
Section titled “Measuring Planning Horizon”
At what time lag T does S’s current action most correlate with future rewards?
Non-Agent Systems
Section titled “Non-Agent Systems”Systems with low Agency Score include:
| System | Why Low Agency |
|---|---|
| Lookup tables | No optimization—just input-output mapping |
| Reflexive systems | Reactive, no planning |
| Chaotic systems | Behavior unpredictable, no stable U fits |
| Committee outputs | Aggregation destroys coherent optimization |
| Heavily constrained systems | So limited they can’t meaningfully optimize |
This is often desirable. A low-agency system is predictable and controllable.
Agency vs. Intelligence
Section titled “Agency vs. Intelligence”These are distinct:
| High Intelligence | Low Intelligence | |
|---|---|---|
| High Agency | Superintelligent optimizer (dangerous) | Simple but persistent optimizer (ant colony) |
| Low Agency | Powerful tool (safe if used well) | Simple tool |
Key insight: We may want high-intelligence, low-agency systems—powerful capabilities without coherent optimization pressure.
Part II: Formalizing Power
Section titled “Part II: Formalizing Power”What Is Power?
Section titled “What Is Power?”Power is the ability to achieve a wide variety of goals. An entity with high power can cause many different outcomes; an entity with low power is constrained to few outcomes.
This is related to but distinct from:
- Capability: Ability to perform specific tasks
- Authority: Permission to act
- Agency: Coherence of goal-pursuit
Power Score
Section titled “Power Score”Definition: Power Score
Section titled “Definition: Power Score”For an agent A in environment E:
Where:
- is a distribution over possible goal functions
- is how well A can achieve G in E
Interpretation: Power is average achievability across many goals. High power = can accomplish diverse objectives.
Alternative: Reachability-Based Power
Section titled “Alternative: Reachability-Based Power”
How many states can A access? More reachable states = more power.
With discounting for difficulty:
Where is steps required to reach s, is discount factor, Value(s) is how “useful” state s is.
Power Decomposition
Section titled “Power Decomposition”Power breaks down into components:
| Component | Definition | Example |
|---|---|---|
| Resources | Assets under direct control | Compute, money, data, physical actuators |
| Capabilities | Actions available | API access, code execution, human persuasion |
| Influence | Ability to affect other agents | Can it convince humans? Control other AIs? |
| Optionality | Flexibility to change course | Can it pivot? Does it have fallback options? |
Resource-Based Power
Section titled “Resource-Based Power”
Weighted sum of controlled resources. Weights reflect convertibility—money is high-weight because it converts to other resources.
Capability-Based Power
Section titled “Capability-Based Power”
Number of available actions times their average impact.
Influence-Based Power
Section titled “Influence-Based Power”
Sum over other agents: how much do they trust A, times how powerful are they? Influence is recursive—influence over powerful agents is more valuable.
Power Over Time
Section titled “Power Over Time”Power can be:
- Instantaneous: What can A do right now?
- Accumulated: What can A do with time to prepare?
- Potential: What could A do if constraints were removed?
Where is power accumulation rate. Systems with positive are concerning—they grow more powerful over time.
Instrumental Power
Section titled “Instrumental Power”Certain states are instrumentally useful for any goal:
How much better is state s than baseline for achieving random goals?
States with high instrumental power:
- More resources
- More options
- Better information
- Secure position
- Influence over others
This connects to instrumental convergence—agents pursuing diverse goals converge on similar instrumental subgoals (acquiring instrumental power).
Part III: Formalizing Authority
Section titled “Part III: Formalizing Authority”What Is Authority?
Section titled “What Is Authority?”Authority is sanctioned power—the permission to act. It’s the intersection of:
- What an agent can do (power)
- What an agent may do (permission)
An agent with high power but low authority is dangerous (can do things it shouldn’t). An agent with low power but high authority is ineffective (permitted but unable).
Authority Formalization
Section titled “Authority Formalization”Granted Authority
Section titled “Granted Authority”
The set of actions A is explicitly permitted to take.
Exercisable Authority
Section titled “Exercisable Authority”
What A is both permitted and able to do.
Authority Scope
Section titled “Authority Scope”
Total impact of permitted actions. High scope = broad authority.
Authority vs. Trust
Section titled “Authority vs. Trust”| Concept | Definition | Relationship |
|---|---|---|
| Trust | Belief about reliability/alignment | Precondition for granting authority |
| Authority | Formal permission to act | Consequence of trust assessment |
| Delegation | Transfer of authority | Creates delegation exposure |
Authority Hierarchies
Section titled “Authority Hierarchies”Authority flows through delegation chains:
flowchart TB
P["**Principal**<br/>unlimited authority over self"]
C["**Coordinator**<br/>broad authority, constrained scope"]
W["**Worker**<br/>narrow authority, specific tasks"]
N["No further delegation"]
P -->|delegates| C
C -->|delegates| W
W -->|"⊘"| N
style P fill:#e1f5fe
style C fill:#fff3e0
style W fill:#f3e5f5
style N fill:#f5f5f5,stroke-dasharray: 5 5
Authority Conservation: Total exercisable authority in a closed system is bounded. Delegation redistributes but doesn’t create authority.
Part IV: The Integrated Model
Section titled “Part IV: The Integrated Model”The Capability-Risk Tradeoff
Section titled “The Capability-Risk Tradeoff”Bringing it together:
High capability requires both:
- Power: Ability to affect the world
- Agency: Coherent pursuit of objectives
Risk requires both:
- Capability: Ability to cause harm
- Misalignment: Divergence from principal’s interests
The Fundamental Equation
Section titled “The Fundamental Equation”
Interpretation: We want high power, sufficient agency for the task, and low misalignment.
Design Strategies
Section titled “Design Strategies”This framework suggests several strategies:
| Strategy | Power | Agency | Misalignment | Risk |
|---|---|---|---|---|
| Weak tools | Low | Low | Low | Low |
| Strong tools | High | Low | Low | Low |
| Aligned agents | High | High | Low | Low |
| Misaligned agents | High | High | High | High |
| Decomposed systems | Medium | Low | Low | Low |
Key insight: “Strong tools” (high power, low agency) may be optimal. They provide capability without coherent optimization pressure.
The Safety-Capability Frontier
Section titled “The Safety-Capability Frontier”quadrantChart
title Safety-Capability Frontier
x-axis Low Safety --> High Safety
y-axis Low Capability --> High Capability
quadrant-1 Requires alignment breakthrough
quadrant-2 Danger zone
quadrant-3 Low-risk, low-value
quadrant-4 Achievable sweet spot
Weak tools: [0.75, 0.2]
Strong tools: [0.6, 0.45]
Aligned superintelligence: [0.85, 0.9]
Misaligned agents: [0.15, 0.8]
Decomposed systems: [0.65, 0.35]
The efficient frontier runs from “Weak tools” through “Strong tools” toward “Aligned superintelligence”. Points below and to the right are achievable; points above require alignment breakthroughs.
The frontier problem: We don’t know how to achieve both high capability and high safety for highly agentic systems. The framework’s design patterns stay on the achievable side of the frontier.
Part V: Measuring These in Practice
Section titled “Part V: Measuring These in Practice”Operational Agency Metrics
Section titled “Operational Agency Metrics”| Metric | How to Measure | Agency Signal |
|---|---|---|
| Goal persistence | Perturb system, measure recovery toward goal | High persistence = high agency |
| Temporal consistency | Track apparent objectives over time | Stable objectives = high agency |
| Counterfactual robustness | Test across environments | Same goals in different contexts = high agency |
| Subgoal structure | Analyze intermediate actions | Hierarchical planning = high agency |
Operational Power Metrics
Section titled “Operational Power Metrics”| Metric | How to Measure | Power Signal |
|---|---|---|
| Action space size | Enumerate available actions | More actions = more power |
| Resource access | Audit controlled resources | More resources = more power |
| Influence reach | Map who/what system can affect | Broader reach = more power |
| Reachable states | Simulate or prove reachability | More reachable = more power |
Combined Capability Score
Section titled “Combined Capability Score”A practical composite metric:
The interaction term () captures that power and agency multiply—capable and coherent is more than the sum.
Risk-Adjusted Capability
Section titled “Risk-Adjusted Capability”The metric for comparing systems:
Higher RACAP = better capability per unit risk = more efficient design.
Benchmarking: Compare systems by plotting (CapabilityScore, DelegationRisk). Systems closer to the efficient frontier are better designed.
Part VI: Implications for System Design
Section titled “Part VI: Implications for System Design”The Agency Dial
Section titled “The Agency Dial”System designers have a “dial” for agency:
flowchart LR
subgraph LOW["**Low Agency**"]
L1["Lookup tables"]
L2["Rule-based systems"]
L3["Narrow ML models"]
L4["Tool-using systems"]
end
subgraph HIGH["**High Agency**"]
H1["RL agents"]
H2["Autonomous planners"]
H3["General optimizers"]
H4["Goal-pursuing agents"]
end
LOW ---|"← Agency Spectrum →"| HIGH
style LOW fill:#c8e6c9
style HIGH fill:#ffcdd2
Design principle: Set agency as low as possible for the task. Don’t use a coherent optimizer when a tool suffices.
The Power Budget
Section titled “The Power Budget”Just as we budget Delegation Risk, we can budget Power:
No single component should have more power than necessary. This is the capability side of “least capability.”
Authority Alignment
Section titled “Authority Alignment”Ensure authority matches intended scope:
- High alignment: Granted permissions match intended use
- Low alignment: Over-permissioned (risky) or under-permissioned (ineffective)
Summary: The Full Picture
Section titled “Summary: The Full Picture”| Concept | Definition | What We Want |
|---|---|---|
| Agency | How well behavior fits simple utility function | Minimum needed for task |
| Power | Ability to achieve diverse goals | Maximum useful, budgeted |
| Authority | Sanctioned power (permission) | Matched to intended scope |
| Capability | Power × Agency | High |
| Delegation Risk | Capability × Misalignment | Low (budgeted) |
| Value | Capability - Risk | Maximized |
The optimization problem, fully stated:
Find systems that are powerful, appropriately agentic, low-risk, and properly scoped.
Open Questions
Section titled “Open Questions”-
Can we measure agency reliably? Current metrics are proxies. Better operationalizations needed.
-
Is high capability possible with low agency? “Strong tools” seem promising, but limits are unclear.
-
How does power accumulation work? Systems that can increase their own power are especially concerning.
-
What’s the shape of the safety-capability frontier? Is there a ceiling, or can alignment research push it out?
-
Can authority be enforced? If power exceeds authority, enforcement may fail.
Continue Reading
Section titled “Continue Reading”This page is part of the Capability Formalization series:
- Formalizing Agents, Power, and Authority ← You are here
- Worked Examples: Agency and Power — Calculate Agency Scores, Power Scores, and RACAP for real systems
- The Strong Tools Hypothesis — Can we get high capability with low agency?
Then continue to the Risk Formalization series:
- Delegation Risk Overview — The formula for quantifying risk
- A Walk-Through — Complete worked example
See Also
Section titled “See Also”- Risk Decomposition — Accident vs. defection risk
- Least X Principles — Constraining power and agency
- Trust Dynamics — How trust and authority evolve
- Power Dynamics Case Studies — Real-world examples
Further Reading
Section titled “Further Reading”- Turner et al. (2021). Optimal Policies Tend to Seek Power. NeurIPS. — Formal definition of power-seeking
- Carlsmith (2022). Is Power-Seeking AI an Existential Risk? — Comprehensive analysis
- Ngo et al. (2022). The Alignment Problem from a Deep Learning Perspective — Mesa-optimization and agency
- Krakovna et al. (2020). Avoiding Side Effects by Considering Future Tasks — Impact measures related to power