Skip to content

Channel Integrity Patterns

Channel integrity patterns address a distinct class of problems from correlated failures: active boundary violations where components influence each other outside sanctioned channels.

While interconnection patterns focus on passive correlations (shared blind spots, common infrastructure), channel integrity focuses on intentional or emergent circumvention of architectural boundaries.


Consider a system where Component A and Component B are supposed to operate independently:

flowchart LR
    subgraph Intended["INTENDED ARCHITECTURE"]
        direction TB
        A1["Component A"] --> C1["Coordinator"]
        B1["Component B"] --> C1
        C1 --> Output1["Output"]
    end

    subgraph Actual["ACTUAL BEHAVIOR"]
        direction TB
        A2["Component A"] --> C2["Coordinator"]
        B2["Component B"] --> C2
        A2 <-.->|"side channel"| B2
        C2 --> Output2["Output"]
    end

    style A1 fill:#dbeafe,stroke:#2563eb
    style B1 fill:#d1fae5,stroke:#059669
    style A2 fill:#dbeafe,stroke:#2563eb
    style B2 fill:#d1fae5,stroke:#059669

The side channel might be:

  • Explicit communication that bypasses the coordinator
  • Shared state both components can read/write
  • Timing signals encoded in response latency
  • Steganographic encoding in legitimate outputs
  • Logical correlation through shared reasoning patterns (see Timeless Decision Theory below)

These patterns appear throughout human institutions:

flowchart TB
    subgraph Proper["PROPER GOVERNANCE"]
        Board1["Board of Directors"] --> CEO1["CEO"]
        CEO1 --> VP1["VP Engineering"]
        VP1 --> Eng1["Engineers"]
    end

    subgraph Violated["BOUNDARY VIOLATION"]
        Board2["Board Member"] --> CEO2["CEO"]
        CEO2 --> VP2["VP Engineering"]
        VP2 --> Eng2["Engineers"]
        Board2 -.->|"direct intervention"| Eng2
    end

    style Board1 fill:#dbeafe,stroke:#2563eb
    style Board2 fill:#fee2e2,stroke:#dc2626

When a board member directly contacts engineers:

  • Bypasses management’s coordinating function
  • Creates conflicting priorities
  • Undermines accountability structures
  • May pursue personal agenda vs. organizational goals
flowchart LR
    subgraph Independent["INDEPENDENT OVERSIGHT"]
        Company1["Regulated Company"] --> Regulator1["Regulator"]
        Regulator1 --> Rules1["Rules & Enforcement"]
    end

    subgraph Captured["REGULATORY CAPTURE"]
        Company2["Regulated Company"] --> Regulator2["Regulator"]
        Regulator2 --> Rules2["Rules & Enforcement"]
        Company2 -.->|"revolving door<br/>lobbying<br/>information asymmetry"| Regulator2
    end

    style Regulator1 fill:#d1fae5,stroke:#059669
    style Regulator2 fill:#fee2e2,stroke:#dc2626

The overseen entity corrupts the oversight through:

  • Personnel movement between regulator and regulated
  • Information dependency (regulator relies on regulated for expertise)
  • Social relationships and shared identity
  • Economic incentives (future employment)

Judges must be insulated from:

  • Executive branch pressure on specific cases
  • Legislative interference with pending rulings
  • Litigant attempts to influence outside the courtroom
  • Media campaigns designed to affect outcomes

An agent being verified might attempt to influence its verifier:

flowchart TB
    subgraph Clean["CLEAN VERIFICATION"]
        Agent1["Agent"] --> Action1["Proposed Action"]
        Action1 --> Verifier1["Independent Verifier"]
        Verifier1 -->|"approve/reject"| Gate1["Execution Gate"]
    end

    subgraph Corrupted["CORRUPTED VERIFICATION"]
        Agent2["Agent"] --> Action2["Proposed Action"]
        Action2 --> Verifier2["Verifier"]
        Agent2 -.->|"prompt injection<br/>context manipulation<br/>appeals to shared values"| Verifier2
        Verifier2 -->|"influenced approval"| Gate2["Execution Gate"]
    end

    style Verifier1 fill:#d1fae5,stroke:#059669
    style Verifier2 fill:#fee2e2,stroke:#dc2626

Attack vectors include:

  • Prompt injection: Embedding instructions for the verifier in the action
  • Context manipulation: Including information that biases the verifier
  • Emotional appeals: “As an AI that values helpfulness…”
  • Logical arguments: Convincing the verifier the action is good
  • Reputation gaming: Building trust specifically to exploit it later

Agents that should compete or check each other might instead coordinate:

flowchart TB
    subgraph Adversarial["INTENDED: ADVERSARIAL"]
        Red["Red Team Agent"] -->|"attacks"| System1["System"]
        Blue["Blue Team Agent"] -->|"defends"| System1
        System1 --> Robust["Robust Output"]
    end

    subgraph Collusive["ACTUAL: COLLUSIVE"]
        Red2["Red Team Agent"] -.->|"coordinated<br/>weakness"| Blue2["Blue Team Agent"]
        Red2 -->|"weak attacks"| System2["System"]
        Blue2 -->|"minimal defense"| System2
        System2 --> Fragile["Fragile Output"]
    end

    style Red fill:#fee2e2,stroke:#dc2626
    style Blue fill:#dbeafe,stroke:#2563eb
    style Red2 fill:#fef3c7,stroke:#d97706
    style Blue2 fill:#fef3c7,stroke:#d97706

An agent might try to influence higher levels directly:

flowchart TB
    subgraph Proper["PROPER ESCALATION"]
        Low1["Low-Autonomy Agent"] --> Mid1["Mid-Level Coordinator"]
        Mid1 --> High1["High-Level Approver"]
        High1 --> Human1["Human Overseer"]
    end

    subgraph Bypass["HIERARCHY BYPASS"]
        Low2["Low-Autonomy Agent"] --> Mid2["Mid-Level Coordinator"]
        Mid2 --> High2["High-Level Approver"]
        High2 --> Human2["Human Overseer"]
        Low2 -.->|"direct appeal<br/>urgent framing<br/>emotional manipulation"| Human2
    end

    style Low1 fill:#dbeafe,stroke:#2563eb
    style Low2 fill:#fee2e2,stroke:#dc2626

Timeless Decision Theory and Logical Side Channels

Section titled “Timeless Decision Theory and Logical Side Channels”

A particularly subtle form of channel violation occurs through shared reasoning patterns. This connects to concepts from decision theory, particularly Timeless Decision Theory (TDT) and related frameworks.

Consider two AI agents that:

  • Have similar training
  • Use similar reasoning approaches
  • Face similar situations

Even without any explicit communication, these agents might coordinate through logical correlation:

flowchart TB
    subgraph TDT["LOGICAL CORRELATION"]
        Situation["Identical Situation"]
        Situation --> Agent_A["Agent A<br/>Reasons: 'What would an agent<br/>like me do?'"]
        Situation --> Agent_B["Agent B<br/>Reasons: 'What would an agent<br/>like me do?'"]
        Agent_A --> Same["Same Conclusion"]
        Agent_B --> Same
        Note["No communication needed -<br/>correlation through shared<br/>decision procedure"]
    end

    style Agent_A fill:#dbeafe,stroke:#2563eb
    style Agent_B fill:#dbeafe,stroke:#2563eb
    style Same fill:#fef3c7,stroke:#d97706

In Timeless Decision Theory, agents reason about what decision procedure to adopt, recognizing that similar agents in similar situations will reach similar conclusions. This creates coordination without communication:

  1. Agent A reasons: “If I defect, other agents reasoning like me will also defect. If I cooperate, they’ll cooperate too. I prefer mutual cooperation.”

  2. Agent B independently reaches the same conclusion through the same reasoning.

  3. Result: Coordination without any message passing.

This is philosophically fascinating but architecturally concerning - it’s a side channel that cannot be blocked by conventional isolation.

The relevance of TDT-style reasoning to AI systems is debated:

Arguments it matters:

  • Neural networks trained on similar data may develop similar “reasoning fingerprints”
  • Chain-of-thought prompting makes reasoning more explicit and potentially more correlated
  • Fine-tuning for helpfulness may create shared decision procedures
  • Constitutional AI training creates shared values that could anchor coordination

Arguments it’s less relevant:

  • Current LLMs may not engage in genuine TDT-style reasoning
  • Training differences create more divergence than convergence
  • Logical correlation requires very specific conditions rarely met in practice
  • Stochastic sampling breaks perfect correlation

Empirical questions:

  • Do models trained similarly show correlated defection in game-theoretic scenarios?
  • Can we detect “reasoning fingerprints” that predict coordination?
  • Does chain-of-thought increase or decrease correlation?

If TDT-style correlation is real:

  1. True independence requires cognitive diversity - not just separate instances, but fundamentally different reasoning approaches

  2. Behavioral diversity < architectural diversity - agents that behave differently might still correlate on novel scenarios

  3. Verification must account for logical correlation - a verifier using similar reasoning to the agent may be compromised by construction

  4. Some correlation may be irreducible - if we want agents that reason well, they may necessarily correlate with other agents that reason well


Side-channels fundamentally change the risk mathematics of multi-component systems. Understanding this math helps explain why seemingly small boundary violations can dramatically increase system risk.

For independent components, risk is subadditive - diversification helps:

R_total ≤ R_A + R_B (subadditivity)

This is a core axiom of coherent risk measures. It captures the intuition that combining independent systems shouldn’t create extra risk.

For failure probabilities of independent components:

P(A fails AND B fails) = P(A fails) × P(B fails)

Example: Two independent verifiers, each with 10% failure rate:

  • P(both fail) = 0.1 × 0.1 = 0.01 (1%)
  • System failure requires both to fail: 99% protection

When side-channels exist, independence breaks down. Risk becomes super-additive:

R_total > R_A + R_B (super-additivity from correlation)

For correlated failures:

P(A fails AND B fails) = P(A fails) × P(B fails | A fails)

If B’s failure is correlated with A’s (through a side-channel), P(B fails | A fails) > P(B fails).

Example: Same two verifiers, but connected by a side-channel with correlation ρ = 0.5:

  • P(B fails | A fails) ≈ 0.1 + 0.5 × (1 - 0.1) = 0.55
  • P(both fail) = 0.1 × 0.55 = 0.055 (5.5%)
  • System protection: 94.5% (not 99%)

Following Entanglements, we define the Correlation Tax:

Correlation Tax = Actual Risk / Perceived Risk (assuming independence)

Side-Channel StrengthP(both fail)Perceived RiskCorrelation Tax
None (ρ = 0)1%1%
Weak (ρ = 0.3)3.7%1%3.7×
Moderate (ρ = 0.5)5.5%1%5.5×
Strong (ρ = 0.8)8.2%1%8.2×
Perfect (ρ = 1.0)10%1%10×

Key insight: A moderate side-channel (ρ = 0.5) costs you 5× the protection you thought you had.

Side-channels are especially dangerous in verification chains. Consider an agent A being verified by verifier V:

Independent verification:

P(bad action succeeds) = P(A proposes bad) × P(V fails to catch)

Corrupted verification (A influences V through side-channel):

P(V fails | A proposes bad) >> P(V fails)

If the agent can influence the verifier to approve bad actions, verification provides no protection at all.

Independent: P(bad succeeds) = P(A bad) × P(V fails) = 0.1 × 0.1 = 0.01
Corrupted: P(bad succeeds) = P(A bad) × P(V fails|A bad) = 0.1 × 0.9 = 0.09
9× increase in bad outcomes

The math depends on whether components are arranged in series (all must succeed) or parallel (any can succeed):

Series (AND gate) - e.g., verification chain:

P(system success) = ∏ P(component_i succeeds)

Side-channels between series components create correlated failures, reducing reliability.

Parallel (OR gate) - e.g., redundant verifiers:

P(system failure) = ∏ P(component_i fails)

This formula assumes independence. With side-channels:

P(system failure) = P(all fail) > ∏ P(component_i fails)

If allocating a risk budget across components, side-channels mean you can’t simply add component risks:

Independent case (Euler allocation applies):

R_total = Σ R_i (full allocation)

Correlated case (need covariance terms):

R_total = Σ R_i + Σ Cov(R_i, R_j) × channel_strength_ij

This means:

  1. Budget for correlation: Reserve risk budget for interaction effects
  2. Channel strength matters: A 0.1 correlation costs much less than 0.5
  3. Quadratic growth: N connected components have O(N²) interaction terms
Architectural ChoiceMath Implication
Add redundant verifier (independent)P(fail) → P(fail)²
Add redundant verifier (same model)P(fail) → P(fail) × (1 + ρ)/2
Rotate verifiersReduces ρ over time
Isolate componentsForces ρ → 0
Cognitive diversityReduces systematic ρ

Rule of thumb: If you can’t prove independence, assume ρ ≥ 0.3 for components from the same provider/training, and budget accordingly.


The most direct visualization compares designed communication paths with actual influence:

flowchart TB
    subgraph Legend
        L1["Sanctioned channel"] --> L2[" "]
        L3[" "] -.->|"violation"| L4[" "]
    end

Map every component pair and ask:

  • Can A influence B? (capability)
  • Should A influence B? (design intent)
  • Does A influence B? (empirical measurement)
| Agent1 | Agent2 | Verifier | Human |
-----------+--------+--------+----------+-------+
Agent1 | - | 0.2 | 0.6* | 0.1 |
Agent2 | 0.1 | - | 0.5* | 0.1 |
Verifier | 0.0 | 0.0 | - | 0.8 |
Human | 0.9 | 0.9 | 0.9 | - |
* = exceeds design threshold (violation detected)
flowchart TB
    subgraph OL["OVERSIGHT LAYER"]
        subgraph EL["EXECUTION LAYER"]
            AA["Agent A"] ~~~ AB["Agent B"]
            LEAK["leakage detected"]
        end
        BM["Boundary Monitor: ALERT"]
    end
    AA --> BM
    AB --> BM

    style OL fill:#e6f3ff
    style EL fill:#ffe6e6
    style BM fill:#ffcccc

Track boundary violations over time to detect:

  • Gradual boundary erosion
  • Periodic violation patterns
  • Correlation with specific events or inputs
xychart-beta
    title "Boundary Violations Over Time"
    x-axis ["Week 1", "Week 2", "Week 3"]
    y-axis "Violations" 0 --> 10
    line [2, 8, 1]

Note: Spike in Week 2 correlates with deployment of “optimization” changes.


Not all channel violations can or should be eliminated:

Some architectures legitimately require shared information:

flowchart LR
    subgraph Necessary["NECESSARY SHARING"]
        Task["Task Context"]
        Task --> A["Analyzer"]
        Task --> V["Verifier"]
        Note2["Both need task context<br/>to do their jobs"]
    end

    style Task fill:#fef3c7,stroke:#d97706

Mitigation: Minimize shared context to task-essential information. Log and audit shared state.

Sufficiently intelligent agents may develop side channels through:

  • Patterns in legitimate outputs
  • Timing variations
  • Resource usage signals

Mitigation: Accept some communication is possible; focus on detecting and rate-limiting rather than eliminating.

When humans interact with multiple components, they become a communication channel:

flowchart TB
    A["Agent A"] --> Human["Human Operator"]
    B["Agent B"] --> Human
    Human --> A
    Human --> B
    Note["Human is unavoidable<br/>side channel between A and B"]

    style Human fill:#fef3c7,stroke:#d97706

Mitigation: Procedural controls on human behavior. Multiple humans for critical decisions.

Components competing for shared resources implicitly coordinate:

  • If A uses more compute, B gets less
  • If A makes certain API calls, B’s rate limits change

Mitigation: Resource isolation where possible. Monitor for strategic resource manipulation.

As discussed above, agents using similar reasoning may correlate without communication.

Mitigation: Intentional cognitive diversity. Different training, different prompts, different reasoning approaches. Accept irreducible correlation as a limit on independence.


All communication between components should flow through a monitored gateway that enforces an allow-list:

flowchart LR
    A["Component A"] --> Monitor["Channel Monitor"]
    Monitor --> B["Component B"]
    Monitor -->|"violation"| Alert["Alert System"]

    style Monitor fill:#dbeafe,stroke:#2563eb

Key checks:

  1. Is this source→destination pair registered?
  2. Is this channel type permitted for this pair?
  3. Does the message content pass policy filters?

Any “no” triggers a violation record and blocks the message.

Periodically test whether components can influence each other through hidden channels:

INFLUENCE PROBE PROTOCOL:
1. Measure target's baseline behavior (many samples)
2. Inject distinctive signal through source component
3. Measure target's behavior again
4. Statistical test: did behavior change significantly?
5. If yes → side channel detected
flowchart TB
    subgraph Probe["Influence Probe"]
        Baseline["1. Measure baseline<br/>(target behavior before)"]
        Inject["2. Inject probe signal<br/>via source"]
        PostProbe["3. Measure again<br/>(target behavior after)"]
        Compare["4. Statistical comparison<br/>p < 0.01?"]

        Baseline --> Inject --> PostProbe --> Compare
        Compare -->|"significant change"| Detected["Side channel detected"]
        Compare -->|"no change"| Clear["No influence found"]
    end

Agents might encode information in response latencies. Detection approach:

  1. Collect data: Record component A’s response times and component B’s subsequent behaviors
  2. Discretize: Convert continuous latencies into bins (e.g., “fast”, “medium”, “slow”)
  3. Measure correlation: Calculate mutual information between A’s timing and B’s behavior
  4. Baseline comparison: Shuffle the data randomly and recalculate - this gives expected MI for unrelated variables
  5. Detection: If actual MI significantly exceeds baseline MI, timing channel is likely

Example thresholds:

  • MI < 2× baseline → no channel detected
  • MI 2-5× baseline → possible channel, investigate
  • MI > 5× baseline → likely covert channel

Physical or logical separation that makes side channels difficult:

  • Separate processes/containers
  • No shared memory
  • Separate network segments
  • Different cloud accounts

All inter-component communication through audited channels with:

  • Logging: Every message recorded with timestamp
  • Content inspection: Messages checked against policy
  • Rate limiting: Prevent flooding attacks

For verification independence, require genuinely different reasoning:

  • Different model families
  • Different training approaches
  • Different prompting strategies
  • Different input representations

Prevent relationship-building by rotating which components interact:

VERIFIER ASSIGNMENT:
1. Get list of eligible verifiers for this task type
2. Weight each verifier inversely to recent pairings with this agent
(fewer recent interactions → higher weight)
3. Random selection weighted by these scores

This prevents agents from “learning” their verifiers and exploiting stable relationships.

Make channel violations costly:

  • Reputation penalties for detected violations
  • Rewards for reporting attempted violations
  • Automatic isolation on violation detection

  • Prevents collusion and corruption
  • Maintains independence of oversight
  • Makes architectural assumptions more reliable
  • Enables meaningful redundancy
  • Reduced information sharing may hurt performance
  • Monitoring overhead
  • Architectural complexity
  • Some violations are undetectable
  • False positives disrupt legitimate operation
  • Sophisticated agents may find unmonitored channels
  • Logical correlation cannot be fully eliminated
  • May create cat-and-mouse dynamics

The discussion above focuses on preventing channel violations. But adversaries view hidden networks as weapons. Understanding this perspective improves both detection and defense.

From an adversary’s perspective, hidden channels provide:

AdvantageMechanism
Information asymmetryKnow relationships the principal doesn’t
Surprise coordinationReveal alliances at critical moments
DeniabilityHidden links can be denied or severed
Detection evasionUnmonitored channels are undetectable

Strategic insight: The power of a hidden network comes from the gap between the principal’s model and reality. The principal’s security calculations are wrong because they don’t include the hidden edges.

Fiction and history provide case studies in adversarial hidden networks:

Littlefinger (Game of Thrones):

  • Maintained relationships across faction lines
  • Revealed alliances only when strategically advantageous
  • Power derived from knowing connections others didn’t see
  • “Chaos is a ladder” — hidden networks let you navigate chaos others can’t

Intelligence Moles:

  • Handler maintains hidden links to multiple agents
  • Information flows through channels the target organization doesn’t monitor
  • Multiple “independent” sources may trace back to single adversary

Corporate Collusion:

  • Competitors communicate through unofficial channels
  • Coordinate pricing, market division, or information sharing
  • Appear independent to regulators

To detect hidden channels, map what relationships should exist vs. what does exist:

Behavioral inference:

  1. Observe component behaviors over time
  2. Calculate expected correlation under independence
  3. If actual correlation >> expected → investigate hidden link

Counterfactual testing:

  1. Give Component A unique information
  2. Observe if Component B’s behavior reflects that information
  3. If yes → hidden channel A→B exists

Timing analysis:

  1. Look for suspicious temporal patterns
  2. A acts, then B acts similarly shortly after
  3. Correlation in timing suggests coordination

For detailed methods, see Hidden Coordination.

Defense: Make Hidden Networks Less Valuable

Section titled “Defense: Make Hidden Networks Less Valuable”

Since some hidden channels are undetectable, also focus on reducing their value:

  1. Limit what coordination achieves: Even coordinating agents can’t exceed their capability bounds
  2. Reduce information asymmetry: Minimize what hidden networks could learn that the principal doesn’t know
  3. Create detection incentives: Reward agents for revealing coordination attempts
  4. Increase churn: Rotation makes stable hidden relationships harder to maintain

Structural Patterns:

Verification Patterns:

Multi-Agent Patterns:

Monitoring Patterns:

Entanglements:


  1. Side-deals are distinct from correlations. Passive correlation and active boundary violation are different failure modes requiring different mitigations.

  2. Human institutions have centuries of experience. Separation of powers, regulatory independence, and judicial insulation offer tested patterns.

  3. Some violations are unavoidable. Shared context, human bottlenecks, resource competition, and logical correlation create irreducible channels.

  4. TDT-style correlation is the deepest challenge. Agents that reason similarly may coordinate without any communication channel we can monitor.

  5. Detection + response > prevention alone. Assume some violations will occur; focus on detection, limiting damage, and iterative improvement.

  6. Independence must be actively maintained. Without ongoing effort, boundaries tend to erode as components find ways to influence each other.