Forecasting-Based Navigation

The hardest part of alignment is specifying what you want. Value specification is philosophically fraught and practically difficult.

Instead of:

Perfectly specify what “good” means
Build AI that pursues that specification
Hope you got it right

Do this:

flowchart LR
    subgraph "Forecasting-Based Navigation"
        A[AI forecasts outcomes] --> B[Sample action space]
        B --> C[Identify robust regions]
        C --> D[Iterate with honest forecasting]
        D --> A
    end

Have AI forecast outcomes of different strategies
Sample broadly across action space
Use calibrated predictions to identify robust good regions
Iterate toward better outcomes using honest forecasting

Example: You’re refining the objective for your “coordinate between services” component.

Without forecasting:

You specify: “Minimize latency between service calls”
Implement and test
10 iterations later: System exploits race conditions for speed gains
Oops—you specified wrong thing

With forecasting:

AI suggests: “Minimize latency between service calls”
You ask: “Forecast outcomes after 10 iterations, sample 1000 scenarios”
AI returns distribution:
- 60%: Faster safe coordination
- 30%: Speed-safety tradeoffs emerge
- 10%: Race condition exploits
AI highlights: “In long tail, latency minimization conflicts with implicit safety requirements”
You refine: “Minimize latency subject to correctness guarantees”
Iterate with better specification

Why This Is Different from Specification

You’re not trying to write down human values in formal logic. You’re using AI’s predictive capabilities to navigate the space of outcomes, choosing paths that look good across many forecasted scenarios.

Recent evidence (Anthropic’s forecasting work, various LLM forecasting benchmarks) suggests frontier models can achieve near-superforecaster calibration on many tasks. The key question is whether they can forecast their own behavior after self-modification—that’s empirically testable.

The Broad Sampling Move

User preferences are hard to specify, but often easy to evaluate on concrete scenarios. So instead of:

“What objective should the AI have?”

You ask:

“Here are 1000 sampled strategies and their forecasted outcomes—which look best?”

This is navigable alignment: steering through outcome space using predictions, rather than specifying destinations using philosophical arguments.

Compositional Safety: How The Pieces Work Together

flowchart TB
    P1[Decomposition] --> S1[No single catastrophe point]
    P2[Honest Coordination] --> S2[No scheming cascade]
    P3[Forecasting] --> S3[Emergent risks caught]
    S1 --> Safe[Safe System]
    S2 --> Safe
    S3 --> Safe
    Safe --> R[Reversible, Non-catastrophic Failures]

The three principles compose into safety properties:

Property 1: No single component can cause catastrophe

Each operates under intelligence budget
Narrow scope limits blast radius
Critical actions require coordinator approval

Property 2: Coordinators can’t cause catastrophic failures through scheming

Operating on verified inputs from narrow components
Forecasting catches problematic long-term strategies
Architectural anti-scheming constraints
Multiple independent checks

Property 3: Emergent risks are caught by forecasting

Multi-step forecasting predicts composite behavior
Sample broadly to find problematic regions
Adjust decomposition boundaries if forecasts show high uncertainty

Property 4: Failures are reversible and non-catastrophic

Changes are incremental
Rollback mechanisms
Human oversight at decision points
No single-point-of-failure

Forecasting-Based Navigation

Forecasting-Based Navigation

How Forecasting-Based Navigation Works

Why This Is Different from Specification

The Broad Sampling Move

Compositional Safety: How The Pieces Work Together

See Also