Skip to content

Forecasting-Based Navigation

The hardest part of alignment is specifying what you want. Value specification is philosophically fraught and practically difficult.

Instead of:

  1. Perfectly specify what “good” means
  2. Build AI that pursues that specification
  3. Hope you got it right

Do this:

flowchart LR
    subgraph "Forecasting-Based Navigation"
        A[AI forecasts outcomes] --> B[Sample action space]
        B --> C[Identify robust regions]
        C --> D[Iterate with honest forecasting]
        D --> A
    end
  1. Have AI forecast outcomes of different strategies
  2. Sample broadly across action space
  3. Use calibrated predictions to identify robust good regions
  4. Iterate toward better outcomes using honest forecasting

Example: You’re refining the objective for your “coordinate between services” component.

Without forecasting:

  • You specify: “Minimize latency between service calls”
  • Implement and test
  • 10 iterations later: System exploits race conditions for speed gains
  • Oops—you specified wrong thing

With forecasting:

  • AI suggests: “Minimize latency between service calls”
  • You ask: “Forecast outcomes after 10 iterations, sample 1000 scenarios”
  • AI returns distribution:
    • 60%: Faster safe coordination
    • 30%: Speed-safety tradeoffs emerge
    • 10%: Race condition exploits
  • AI highlights: “In long tail, latency minimization conflicts with implicit safety requirements”
  • You refine: “Minimize latency subject to correctness guarantees”
  • Iterate with better specification

You’re not trying to write down human values in formal logic. You’re using AI’s predictive capabilities to navigate the space of outcomes, choosing paths that look good across many forecasted scenarios.

Recent evidence (Anthropic’s forecasting work, various LLM forecasting benchmarks) suggests frontier models can achieve near-superforecaster calibration on many tasks. The key question is whether they can forecast their own behavior after self-modification—that’s empirically testable.

User preferences are hard to specify, but often easy to evaluate on concrete scenarios. So instead of:

  • “What objective should the AI have?”

You ask:

  • “Here are 1000 sampled strategies and their forecasted outcomes—which look best?”

This is navigable alignment: steering through outcome space using predictions, rather than specifying destinations using philosophical arguments.

Compositional Safety: How The Pieces Work Together

Section titled “Compositional Safety: How The Pieces Work Together”
flowchart TB
    P1[Decomposition] --> S1[No single catastrophe point]
    P2[Honest Coordination] --> S2[No scheming cascade]
    P3[Forecasting] --> S3[Emergent risks caught]
    S1 --> Safe[Safe System]
    S2 --> Safe
    S3 --> Safe
    Safe --> R[Reversible, Non-catastrophic Failures]

The three principles compose into safety properties:

Property 1: No single component can cause catastrophe

  • Each operates under intelligence budget
  • Narrow scope limits blast radius
  • Critical actions require coordinator approval

Property 2: Coordinators can’t cause catastrophic failures through scheming

  • Operating on verified inputs from narrow components
  • Forecasting catches problematic long-term strategies
  • Architectural anti-scheming constraints
  • Multiple independent checks

Property 3: Emergent risks are caught by forecasting

  • Multi-step forecasting predicts composite behavior
  • Sample broadly to find problematic regions
  • Adjust decomposition boundaries if forecasts show high uncertainty

Property 4: Failures are reversible and non-catastrophic

  • Changes are incremental
  • Rollback mechanisms
  • Human oversight at decision points
  • No single-point-of-failure