Skip to content

Principles to Practice

This page maps the framework’s abstract principles to their concrete applications in the worked examples, helping you see how theory becomes implementation.

Quick Reference: Principles → Components

Section titled “Quick Reference: Principles → Components”
PrincipleResearch AssistantCode Deployment
Least IntelligenceStrategy uses GPT-3.5, not frontierOrchestrator uses no ML at all
Least PrivilegeSearch can only query, not modify DBsReviewer can read code but not write
Least ContextTactics sees only current sub-taskCanary sees only metrics, not code
Least PersistenceAll components statelessAll components stateless
Least AutonomyHuman approves strategic pivotsHuman approves every production deploy
Least SurpriseAll LLMs at temperature=0Fuzzer uses logged seeds
Least ConnectivityHypothesis → Forecaster (one-way)Reviewer → Orchestrator (one-way)
Least ObservabilityComponents can’t detect testingComponents can’t see deployment status
Least ComputeSearch: 1s, Strategy: 10s, Hypothesis: 30sStatic: fast, Review: bounded
Least GeneralityFine-tuned summarizers, not GPT-4Narrow canary model, not frontier
Max VerifiabilityCritical safety logic in verified codeOrchestrator is verified code only
Low-Risk Components (Delegation Risk < $200/month)

Pattern: Use narrow or verified implementations freely.

ComponentExampleImplementation
SearchResearch AssistantCode (API calls + ranking)
Static AnalysisCode DeploymentVerified code
Unit TestsCode DeploymentCode
Progress MonitorResearch AssistantMostly code

Principles applied: Least Generality, Maximum Verifiability

Medium-Risk Components (Delegation Risk $200-$500/month)

Pattern: Fine-tuned narrow models with extensive testing.

ComponentExampleImplementation
SummarizerResearch AssistantFine-tuned 7B model
Security ScannerCode DeploymentFine-tuned 7B model
Canary AnalyzerCode DeploymentFine-tuned narrow model
Red TeamResearch AssistantFine-tuned + checklists

Principles applied: Least Intelligence, Least Generality, verification through narrow training

High-Risk Components (Delegation Risk > $500/month)

Pattern: Defense in depth — multiple constraints, redundancy, human oversight.

ComponentExampleImplementationMitigations
Hypothesis GeneratorResearch AssistantFrontier LLMLow privilege, evaluated by others
Code ReviewerCode DeploymentFrontier LLMAdvisory only, human reviews
Strategy ModuleResearch AssistantGPT-3.5 + rulesOlder model, rule-based filtering

Principles applied: All “Least X” principles simultaneously

Computing Delegation Risk — worked examples

Research Assistant Summarizer:

P(misrepresent findings) × Damage = 0.02 × $5,000 = $100
P(leak proprietary data) × Damage = 0.001 × $50,000 = $50
Total Delegation Risk = $150/month

Code Deployment Canary:

P(approve bad deploy) × Damage = 0.01 × $30,000 = $300
Total Delegation Risk = $300/month
Risk Inheritance — worked examples

Research Assistant (multiplicative rule):

Trust(Human → Hypothesis Generator)
= Trust(Human → Strategy) × Trust(Strategy → Tactics) × Trust(Tactics → Generator)
= 0.95 × 0.90 × 0.85
= 0.73

Code Deployment (blocked by architecture):

Trust(Human → Production via Reviewer) = 0
Why: Reviewer's output doesn't flow to production deploy
Budget Allocation — comparing approaches

Both examples target ~$2,000/month total Delegation Risk but allocate differently:

Research Assistant — more distributed risk:

  • Summarizer: 8%
  • Code Deployer: 55%
  • Coordinator: 26%

Code Deployment — risk concentrated in human-gated components:

  • Production Deploy (human-gated): 26%
  • Code Reviewer (advisory only): 26%
  • Automated components: 48%
Coordinator PrincipleResearch AssistantCode Deployment
Least InfluenceStrategy proposes, doesn’t executeOrchestrator routes, doesn’t decide
Least Direct ImpactWorks through componentsUses verified code, no ML
Maximum TransparencyAll reasoning loggedLogic is auditable code
Simplest CommunicationStructured task objectsEnum states only
Shortest HorizonPlans current task onlyNo planning at all
Mandatory Redundancy5 strategy modules, 4/5 consensusN/A (deterministic)

Use LLM coordination (like Research Assistant) when:

  • Tasks require judgment about priority, strategy, decomposition
  • Outcomes are recoverable (wasted effort, not disasters)
  • Human can audit reasoning after the fact

Use verified code coordination (like Code Deployment) when:

  • Logic is specifiable as rules
  • Outcomes are hard to reverse
  • Correctness is more important than flexibility

Research Assistant Pattern: Bidirectional but constrained

flowchart LR
    RS[Strategy] -->|commands| RT[Tactics]
    RT -->|invocations| RC[Components]
    RC -->|results| RT
    RT -.->|status only| RS

Code Deployment Pattern: Strict pass/fail gates, no negotiation

flowchart LR
    DO[Orchestrator] -->|invoke| DC[Checks]
    DC -->|pass/fail| DO
    DO -->|gate| DP[Production]
Checklist: Applying the Framework

When designing a new system:

  1. Identify failure modes — What can go wrong? What’s the damage?
  2. Compute component Delegation Risks — P(failure) × Damage for each
  3. Set system budget — What total risk is acceptable?
  4. Allocate budgets — Which components get how much?
  5. Choose implementations — Verified code > narrow models > frontier LLMs
  6. Apply principles — Go through each “Least X” for each component
  7. Design coordination — LLM or code? What redundancy?
  8. Identify human gates — Where do humans approve?
  9. Plan verification — How will you test trust assumptions?
  10. Monitor in production — Track actual vs. expected failure rates