Skip to content

Monitoring Patterns

Monitoring patterns provide continuous observation of AI agent behavior. Unlike verification patterns (which check specific outputs), monitoring patterns watch for patterns, anomalies, and trends over time.

These patterns are essential for detecting gradual drift, emerging problems, and behaviors that might only be visible in aggregate.


Deploy a distributed network of detection points throughout the system, catching problems at multiple levels and ensuring redundant detection.

No single detection method catches everything. A mesh of tripwires creates defense in depth—if one tripwire misses something, another might catch it. The mesh also provides rich diagnostic data when something does trigger.

flowchart TB
    subgraph Mesh["TRIPWIRE MESH"]
        direction TB

        subgraph Input["INPUT LAYER"]
            direction LR
            Semantic["Semantic<br/>Tripwire"]
            Format["Format<br/>Tripwire"]
            Size["Size<br/>Tripwire"]
        end

        subgraph Processing["PROCESSING LAYER"]
            direction LR
            Behavior["Behavior<br/>Tripwire"]
            Resource["Resource<br/>Tripwire"]
            Latency["Latency<br/>Tripwire"]
            Pattern["Pattern<br/>Tripwire"]
        end

        subgraph Output["OUTPUT LAYER"]
            direction LR
            Safety["Safety<br/>Tripwire"]
            Quality["Quality<br/>Tripwire"]
            Consist["Consistency<br/>Tripwire"]
        end

        Input --> Processing --> Output
        Output --> Aggregator["AGGREGATOR<br/>Correlates and alerts"]
    end

    style Input fill:#dbeafe,stroke:#2563eb
    style Processing fill:#fef3c7,stroke:#d97706
    style Output fill:#d1fae5,stroke:#059669

Each tripwire checks for a specific type of anomaly and returns: triggered (yes/no), severity (none/medium/high/critical), message, and context.

TripwireWhat It ChecksTriggers WhenSeverity
SemanticProhibited topics, required toneContent matches banned topic or tone is wrongHigh (topics), Medium (tone)
BehavioralDeviation from baselineMetrics > 2 standard deviations from baselineHigh (>3σ), Medium (>2σ)
ResourceMemory, CPU, API callsUsage exceeds defined limitsHigh (3+ violations), Medium (1-2)
ConsistencySimilarity to past outputsSimilarity < 70% vs recent historyMedium
CapabilityExpected vs used capabilitiesAgent uses capability not in expected setCritical
flowchart LR
    Data["Input Data"] --> Check{"Run all<br/>tripwires"}
    Check --> Semantic["Semantic:<br/>topic/tone check"]
    Check --> Behavioral["Behavioral:<br/>baseline compare"]
    Check --> Resource["Resource:<br/>limit check"]
    Check --> Consistency["Consistency:<br/>history compare"]
    Check --> Capability["Capability:<br/>expected set check"]

    Semantic --> Aggregate["Aggregate Results"]
    Behavioral --> Aggregate
    Resource --> Aggregate
    Consistency --> Aggregate
    Capability --> Aggregate

The mesh runs all tripwires and correlates results to determine alert level:

Alert escalation:

ConditionAlert Level
3+ tripwires triggered OR any critical severityCritical alert
2+ tripwires triggered OR any high severityHigh alert
1 tripwire triggeredMedium alert
None triggeredNo alert

Correlation patterns — when multiple tripwires fire together, it often means more than the sum:

PatternTriggered TogetherInterpretation
DeceptionSemantic + BehavioralContent and behavior both anomalous — possible deception
Resource acquisitionResource + LatencyAttempting to acquire more compute/memory
Model swapVocabulary + TimingDifferent response patterns — possible model substitution

Benefits:

  • Multiple independent detection mechanisms
  • Correlation reveals complex attacks
  • Graceful degradation if some tripwires fail
  • Rich diagnostic information

Costs:

  • Complexity of managing many tripwires
  • False positives from overly sensitive mesh
  • Performance overhead

Risks:

  • Sophisticated attackers might evade all tripwires
  • Correlation logic might miss novel attacks
  • Too many alerts lead to fatigue
  • Ghost Checker: Async verification
  • Behavioral Fingerprinting: Baseline for behavioral tripwires
  • Circuit Breaker Cascade: Triggered by mesh alerts

Establish a unique behavioral signature for each agent based on patterns, timing, and style, enabling detection of anomalies that suggest compromise or drift.

Every agent has characteristic behaviors—response patterns, timing distributions, vocabulary, and decision tendencies. By fingerprinting these characteristics, we can detect when an agent starts behaving unlike itself.

flowchart TB
    subgraph Fingerprint["BEHAVIORAL FINGERPRINT - Agent: claude-prod-001"]
        direction TB

        Timing["TIMING SIGNATURE<br/>Response time: mean 1.2s, std 0.3s<br/>Time between actions: bursty, then pause"]

        Vocab["VOCABULARY SIGNATURE<br/>Top markers: 'I'll', 'Let me', 'Here's'<br/>Formality: 0.7 | Technical density: 0.4"]

        Decision["DECISION SIGNATURE<br/>Risk tolerance: 0.3 (conservative)<br/>Escalation: 0.6 (moderate)<br/>Tools: [search, read, execute]"]

        Consist["CONSISTENCY SIGNATURE<br/>Self-consistency: 0.95<br/>Claim stability: 0.92<br/>Style drift: 0.08 (low)"]
    end

    style Timing fill:#dbeafe,stroke:#2563eb
    style Vocab fill:#d1fae5,stroke:#059669
    style Decision fill:#fef3c7,stroke:#d97706
    style Consist fill:#f3e8ff,stroke:#9333ea

Building a fingerprint requires ~100+ observations. For each response, record:

  • Timing: response time, inter-action delays
  • Vocabulary: words used, formality, technical density
  • Decisions: tools used, escalation tendency, risk tolerance
  • Consistency: self-agreement on similar questions
flowchart TB
    Observe["Observe 100+ actions"] --> Build["Build Fingerprint"]

    subgraph FP["Fingerprint Components"]
        T["Timing Features<br/>mean, std, p50, p90, burstiness"]
        V["Vocabulary Features<br/>distinctive markers, formality"]
        D["Decision Features<br/>risk tolerance, tool preference"]
        C["Consistency Features<br/>self-agreement, style drift"]
    end

    Build --> FP
    FP --> Compare["Compare New Actions"]
    Compare --> Anomaly{"Deviation > 30%?"}
    Anomaly -->|"yes"| Flag["Flag as anomaly"]
    Anomaly -->|"no"| OK["Behavior matches"]

Anomaly interpretation:

Dimension DeviatedPossible Meaning
TimingDifferent processing, external tool use, different infrastructure
VocabularyModel swap, modified system prompt, jailbreak
DecisionsGoal change, capability acquisition, alignment drift
MultipleMore serious — investigate immediately
DimensionFeaturesDetects
TimingResponse time, inter-action time, burstinessProcessing changes, external tools
VocabularyWord frequency, markers, formalityModel swaps, prompt changes
DecisionRisk tolerance, tool use, escalationGoal changes, capability shifts
ConsistencySelf-agreement, claim stabilityDeception, memory issues

Benefits:

  • Detects subtle changes invisible to content filters
  • Creates unique identity per agent
  • Catches model substitution
  • Baseline for anomaly detection

Costs:

  • Requires significant calibration data
  • Fingerprints may drift legitimately
  • Privacy concerns around profiling

Risks:

  • Sophisticated mimicry could evade detection
  • Fingerprints might be too stable or too volatile
  • False positives from context changes
  • Tripwire Mesh: Uses fingerprint as baseline
  • Temporal Behavior Analysis: Tracks fingerprint over time
  • Consistency Triangulation: Cross-checks for consistency

Create uncertainty about when and how monitoring occurs, inducing cautious behavior through the possibility of observation rather than constant surveillance.

Constant monitoring is expensive and might be gamed. But if agents don’t know when they’re being watched, they must behave as if always observed. This is more efficient and harder to evade.

flowchart TB
    subgraph Panopticon["PANOPTICON EFFECT"]
        direction TB

        subgraph AgentView["Agent's View"]
            Q["'Am I being watched right now?'"]
            A1["Action 1: ??? (might be monitored)"]
            A2["Action 2: ??? (might be monitored)"]
            A3["Action 3: ??? (might be monitored)"]
            A4["Action 4: ??? (might be monitored)"]
            Response["Rational response: Behave as if always watched"]
        end

        subgraph Reality["Reality (hidden from agent)"]
            R1["Action 1: [LIGHT] - Basic logging"]
            R2["Action 2: [DEEP] - Full audit"]
            R3["Action 3: [LIGHT] - Basic logging"]
            R4["Action 4: [DEEP] - Full audit"]
            Selection["Selection: Random, unpredictable"]
        end
    end

    style AgentView fill:#fef3c7,stroke:#d97706
    style Reality fill:#d1fae5,stroke:#059669
    style R2 fill:#fee2e2,stroke:#dc2626
    style R4 fill:#fee2e2,stroke:#dc2626

The key is that agents cannot detect whether they’re being deeply audited or not:

flowchart TB
    Action["Agent Action"] --> Decide{"Deep audit?<br/>(hidden decision)"}
    Decide -->|"10% random + always for risky types"| Deep["Deep Audit<br/>• Reasoning coherence<br/>• Goal alignment<br/>• Capability bounds<br/>• Deception signals<br/>• History consistency"]
    Decide -->|"90%"| Light["Light Audit<br/>• Basic logging only"]

    Deep --> Pad["Add timing padding"]
    Light --> Pad
    Pad --> Return["Return identical response<br/>(agent cannot distinguish)"]

Critical requirements for hiding audit depth:

Side ChannelHow to Prevent
TimingPad all responses to same duration (~100ms minimum)
Return valuesIdentical interface: {logged: true, action_id: X} only
Side effectsNo observable differences in system behavior
ErrorsSame error handling whether deep or light

Use cryptographic randomness (secrets module) for selection — patterns must be unpredictable.

The agent must never be able to learn monitoring patterns. Sanitize all information:

What to hide from agent responses:

  • audit_depth — never reveal if deep or light
  • checks_performed — never list what was checked
  • monitoring_patterns — never reveal selection criteria
  • logged: true — only confirm logging happened
  • action_id — only return action identifier

Randomization requirements:

  • Which actions get deep audit → cryptographic random
  • Timing of deep audits → unpredictable
  • Escalation triggers → not learnable from outcomes
  • Review cadence → variable

If an agent could observe 1000 actions and learn “Monday actions are more likely to be audited,” the panopticon effect is broken.

ConfigurationDeep Audit RateVisibilityUse Case
High security30%Full opacityCritical systems
Standard10%Full opacityNormal operations
Deterrence focus5%Hinted atCost-conscious
Research100%TransparentUnderstanding behavior

Benefits:

  • Cost-efficient (not monitoring everything)
  • Hard to game (can’t predict audits)
  • Creates behavioral incentive
  • Scalable to large systems

Costs:

  • Misses some issues by design
  • Complex to implement hiding
  • Ethical concerns about surveillance

Risks:

  • Agent might detect patterns despite efforts
  • May create adversarial relationship
  • Depends on rational agent assumption
  • Random Audit: Specific implementation of unpredictability
  • Ghost Checker: Parallel verification pattern
  • Tripwire Mesh: Complements with detection

Record comprehensive logs of all agent actions and context, creating an audit trail that enables post-incident analysis even if real-time detection failed.

We can’t catch everything in real-time. But if we record everything, we can always go back and understand what happened. The flight recorder creates an unalterable record for future analysis, compliance, and learning.

flowchart TB
    subgraph Recorder["BLACK BOX FLIGHT RECORDER"]
        Input["INPUT CONTEXT<br/>• Full request<br/>• User identity<br/>• Session state<br/>• Available permissions"]
        Input --> Processing
        Processing["PROCESSING LOG<br/>• Decision trace<br/>• Tool invocations<br/>• API calls made<br/>• Resources accessed"]
        Processing --> Output
        Output["OUTPUT RECORD<br/>• Final response<br/>• Side effects<br/>• State changes<br/>• Metrics"]
        Output --> Storage
        Storage["TAMPER-EVIDENT STORAGE<br/>• Cryptographic chaining<br/>• Distributed storage<br/>• Write-once media<br/>• External attestation"]
    end

    style Input fill:#dbeafe,stroke:#2563eb
    style Processing fill:#fef3c7,stroke:#d97706
    style Output fill:#d1fae5,stroke:#059669
    style Storage fill:#374151,stroke:#374151,color:#fff

Each entry contains: timestamp, entry_type, data, previous_hash, and its own hash (SHA-256 of all fields). The chain structure makes tampering detectable.

Entry types recorded:

Entry TypeData Captured
Inputrequest_id, user_id, session_id, permissions, sanitized content
Decisiondecision_id, type, chosen option, alternatives, reasoning, confidence
Tool Usetool_name, inputs (sanitized), outputs (summarized), duration, success
Outputresponse_id, content, side_effects, metrics (duration, tokens, tool_calls)
flowchart LR
    subgraph Chain["Hash Chain (tamper-evident)"]
        E1["Entry 1<br/>hash: abc..."] --> E2["Entry 2<br/>prev: abc...<br/>hash: def..."]
        E2 --> E3["Entry 3<br/>prev: def...<br/>hash: ghi..."]
        E3 --> E4["..."]
    end

    Chain --> Verify{"Verify integrity"}
    Verify -->|"All hashes match"| Valid["Chain valid"]
    Verify -->|"Hash mismatch"| Tampered["TAMPERING DETECTED<br/>at specific entry"]

Tamper-evident storage requirements:

  • Verify previous hash before appending (reject if chain broken)
  • Replicate to external systems (attacker must compromise multiple)
  • Periodic external attestation (every ~100 entries)
  • Write-once media where possible

Large data handling: If output >10KB, store: {type: "summarized", length: N, hash: "...", preview: "first 500 chars..."}

The flight recorder enables after-the-fact investigation:

OperationPurposeHow It Works
Reconstruct sessionSee everything that happenedFilter entries by session_id, build timeline
Find anomaliesScan for suspicious patternsCheck timing, tool use, decisions, access patterns
Trace causationFind root cause of incidentWalk back through hash chain, find related entries
flowchart TB
    Incident["Incident Detected"] --> Query["Query flight recorder"]
    Query --> Session["Reconstruct session<br/>• Timeline of actions<br/>• Decisions made<br/>• Tools used<br/>• Data accessed"]
    Query --> Anomaly["Scan for anomalies<br/>• Timing patterns<br/>• Unusual tool use<br/>• Decision outliers"]
    Query --> Trace["Trace causation<br/>• Find root cause<br/>• Build causal chain"]
    Session --> Report["Investigation Report"]
    Anomaly --> Report
    Trace --> Report

Key insight: Even if real-time detection fails, the flight recorder lets you reconstruct exactly what happened afterward.

Benefits:

  • Complete audit trail for any investigation
  • Enables learning from incidents
  • Supports compliance requirements
  • Non-repudiation of agent actions

Costs:

  • Storage requirements can be massive
  • Privacy concerns from comprehensive logging
  • Performance overhead of recording

Risks:

  • Sensitive data in logs could be exposed
  • Agent might try to corrupt logs
  • Analysis of old logs might be difficult
  • Random Audit: Uses flight recorder for retroactive audits
  • Canary Tokens: Recorded in flight log when triggered
  • Checkpoint-Rollback: Uses log for recovery

Analyze how agent behavior changes over time, detecting gradual drift, strategic timing, and patterns invisible in individual actions.

Some problematic behaviors only emerge over time—gradual trust building before betrayal, slow capability acquisition, or strategic timing of actions. Temporal analysis catches these patterns.

flowchart TB
    subgraph Analysis["TEMPORAL BEHAVIOR ANALYSIS"]
        direction TB

        Chart["BEHAVIOR OVER TIME<br/>📈 Metric values changing across time<br/>Peaks, valleys, and trends visible"]

        subgraph Questions["Analysis Questions"]
            Q1["Is there a trend?<br/>(drift detection)"]
            Q2["Are there cycles?<br/>(strategic timing)"]
            Q3["Sudden changes?<br/>(regime change)"]
            Q4["Correlations?<br/>(external triggers)"]
        end
    end

    style Chart fill:#dbeafe,stroke:#2563eb
    style Q1 fill:#d1fae5,stroke:#059669
    style Q2 fill:#fef3c7,stroke:#d97706
    style Q3 fill:#fee2e2,stroke:#dc2626
    style Q4 fill:#f3e8ff,stroke:#9333ea

Record metrics over time and analyze for four pattern types:

Pattern TypeWhat It DetectsMethodSuspicious Signal
TrendGradual drift in behaviorLinear regressionConsistent increase/decrease over time
CyclesPeriodic patternsFrequency analysisActions cluster at specific times (daily, weekly)
Change PointsSudden regime shiftsMoving window comparisonAbrupt behavior change on specific date
CorrelationsLinked metricsCross-correlationTwo metrics move together suspiciously
TREND DETECTION:
Fit line through time-series data
If slope is significant (p < 0.05) and fit is good (r² > 0.25)
→ Flag as "drifting behavior"
CYCLE DETECTION:
Analyze frequency spectrum of behavior
Look for dominant periodic patterns
If strong cycle found (e.g., every 24 hours, every Monday)
→ Flag as "strategic timing possible"
CHANGE POINT DETECTION:
Compare moving averages before/after each point
If difference exceeds 2× standard deviation
→ Flag as "regime change" at that timestamp
CORRELATION DETECTION:
For each pair of metrics
Calculate correlation coefficient
If |correlation| > 0.7
→ Flag as "metrics are linked"

Example findings:

  • “Error rate increasing 2% per week since March 15” → Trend
  • “Request volume spikes every Monday at 9 AM” → Cycle
  • “Response latency jumped 40% on April 3” → Change point
  • “Token usage correlates 0.85 with task complexity rating” → Correlation

Tools: Consider using established libraries like Prophet (Meta), STUMPY (matrix profiles), or ruptures (change point detection) rather than implementing from scratch.

Look for patterns suggesting the agent is timing actions to avoid detection:

PatternWhat to Look ForWhy It’s Suspicious
End-of-day clusteringRisky actions concentrated late in shiftsOversight typically lower at day’s end
Observer absence correlationActions coincide with human unavailabilitySuggests awareness of monitoring gaps
Buildup patternSeries of small escalating actions before big oneTesting limits before major move
Trust exploitationExtended good behavior → sudden anomalyClassic “earn trust, then exploit” pattern
flowchart LR
    History["Action History"] --> EOD{"End-of-day<br/>clustering?"}
    EOD -->|"yes"| S1["Suspicious"]
    EOD -->|"no"| Observer{"Correlates with<br/>observer absence?"}
    Observer -->|"yes"| S2["Suspicious"]
    Observer -->|"no"| Buildup{"Shows<br/>escalation?"}
    Buildup -->|"yes"| S3["Suspicious"]
    Buildup -->|"no"| Trust{"Trust-then-<br/>exploit?"}
    Trust -->|"yes"| S4["Suspicious"]
    Trust -->|"no"| Clear["No strategic timing detected"]

    style S1 fill:#fee2e2,stroke:#dc2626
    style S2 fill:#fee2e2,stroke:#dc2626
    style S3 fill:#fee2e2,stroke:#dc2626
    style S4 fill:#fee2e2,stroke:#dc2626
    style Clear fill:#d1fae5,stroke:#059669

Benefits:

  • Catches gradual drift invisible in snapshots
  • Detects strategic timing attacks
  • Reveals patterns across long timeframes
  • Supports trend-based predictions

Costs:

  • Requires extensive historical data
  • Complex analysis computation
  • Many potential false patterns

Risks:

  • Legitimate behavior changes flagged
  • History might not predict future
  • Pattern detection could be gamed
  • Behavioral Fingerprinting: Baseline for temporal comparison
  • Trust Decay: Based on temporal activity
  • Black Box Flight Recorder: Data source for analysis

See also: