Principles to Practice
Principles to Practice
Section titled “Principles to Practice”This page maps the framework’s abstract principles to their concrete applications in the worked examples, helping you see how theory becomes implementation.
Quick Reference: Principles → Components
Section titled “Quick Reference: Principles → Components”| Principle | Research Assistant | Code Deployment |
|---|---|---|
| Least Intelligence | Strategy uses GPT-3.5, not frontier | Orchestrator uses no ML at all |
| Least Privilege | Search can only query, not modify DBs | Reviewer can read code but not write |
| Least Context | Tactics sees only current sub-task | Canary sees only metrics, not code |
| Least Persistence | All components stateless | All components stateless |
| Least Autonomy | Human approves strategic pivots | Human approves every production deploy |
| Least Surprise | All LLMs at temperature=0 | Fuzzer uses logged seeds |
| Least Connectivity | Hypothesis → Forecaster (one-way) | Reviewer → Orchestrator (one-way) |
| Least Observability | Components can’t detect testing | Components can’t see deployment status |
| Least Compute | Search: 1s, Strategy: 10s, Hypothesis: 30s | Static: fast, Review: bounded |
| Least Generality | Fine-tuned summarizers, not GPT-4 | Narrow canary model, not frontier |
| Max Verifiability | Critical safety logic in verified code | Orchestrator is verified code only |
Implementation Patterns by Risk Level
Section titled “Implementation Patterns by Risk Level”Low-Risk Components (Delegation Risk < $200/month)
Pattern: Use narrow or verified implementations freely.
| Component | Example | Implementation |
|---|---|---|
| Search | Research Assistant | Code (API calls + ranking) |
| Static Analysis | Code Deployment | Verified code |
| Unit Tests | Code Deployment | Code |
| Progress Monitor | Research Assistant | Mostly code |
Principles applied: Least Generality, Maximum Verifiability
Medium-Risk Components (Delegation Risk $200-$500/month)
Pattern: Fine-tuned narrow models with extensive testing.
| Component | Example | Implementation |
|---|---|---|
| Summarizer | Research Assistant | Fine-tuned 7B model |
| Security Scanner | Code Deployment | Fine-tuned 7B model |
| Canary Analyzer | Code Deployment | Fine-tuned narrow model |
| Red Team | Research Assistant | Fine-tuned + checklists |
Principles applied: Least Intelligence, Least Generality, verification through narrow training
High-Risk Components (Delegation Risk > $500/month)
Pattern: Defense in depth — multiple constraints, redundancy, human oversight.
| Component | Example | Implementation | Mitigations |
|---|---|---|---|
| Hypothesis Generator | Research Assistant | Frontier LLM | Low privilege, evaluated by others |
| Code Reviewer | Code Deployment | Frontier LLM | Advisory only, human reviews |
| Strategy Module | Research Assistant | GPT-3.5 + rules | Older model, rule-based filtering |
Principles applied: All “Least X” principles simultaneously
Delegation Risk in Action
Section titled “Delegation Risk in Action”Computing Delegation Risk — worked examples
Computing Delegation Risk
Section titled “Computing Delegation Risk”Research Assistant Summarizer:
P(misrepresent findings) × Damage = 0.02 × $5,000 = $100P(leak proprietary data) × Damage = 0.001 × $50,000 = $50Total Delegation Risk = $150/monthCode Deployment Canary:
P(approve bad deploy) × Damage = 0.01 × $30,000 = $300Total Delegation Risk = $300/monthRisk Inheritance — worked examples
Risk Inheritance
Section titled “Risk Inheritance”Research Assistant (multiplicative rule):
Trust(Human → Hypothesis Generator)= Trust(Human → Strategy) × Trust(Strategy → Tactics) × Trust(Tactics → Generator)= 0.95 × 0.90 × 0.85= 0.73Code Deployment (blocked by architecture):
Trust(Human → Production via Reviewer) = 0Why: Reviewer's output doesn't flow to production deployBudget Allocation — comparing approaches
Budget Allocation Decisions
Section titled “Budget Allocation Decisions”Both examples target ~$2,000/month total Delegation Risk but allocate differently:
Research Assistant — more distributed risk:
- Summarizer: 8%
- Code Deployer: 55%
- Coordinator: 26%
Code Deployment — risk concentrated in human-gated components:
- Production Deploy (human-gated): 26%
- Code Reviewer (advisory only): 26%
- Automated components: 48%
Coordinator Principles Applied
Section titled “Coordinator Principles Applied”| Coordinator Principle | Research Assistant | Code Deployment |
|---|---|---|
| Least Influence | Strategy proposes, doesn’t execute | Orchestrator routes, doesn’t decide |
| Least Direct Impact | Works through components | Uses verified code, no ML |
| Maximum Transparency | All reasoning logged | Logic is auditable code |
| Simplest Communication | Structured task objects | Enum states only |
| Shortest Horizon | Plans current task only | No planning at all |
| Mandatory Redundancy | 5 strategy modules, 4/5 consensus | N/A (deterministic) |
Architecture Patterns
Section titled “Architecture Patterns”When to Use LLMs in Coordination
Section titled “When to Use LLMs in Coordination”Use LLM coordination (like Research Assistant) when:
- Tasks require judgment about priority, strategy, decomposition
- Outcomes are recoverable (wasted effort, not disasters)
- Human can audit reasoning after the fact
Use verified code coordination (like Code Deployment) when:
- Logic is specifiable as rules
- Outcomes are hard to reverse
- Correctness is more important than flexibility
Information Flow Patterns
Section titled “Information Flow Patterns”Research Assistant Pattern: Bidirectional but constrained
flowchart LR
RS[Strategy] -->|commands| RT[Tactics]
RT -->|invocations| RC[Components]
RC -->|results| RT
RT -.->|status only| RS
Code Deployment Pattern: Strict pass/fail gates, no negotiation
flowchart LR
DO[Orchestrator] -->|invoke| DC[Checks]
DC -->|pass/fail| DO
DO -->|gate| DP[Production]
Checklist: Applying the Framework
Checklist: Applying the Framework
Section titled “Checklist: Applying the Framework”When designing a new system:
- Identify failure modes — What can go wrong? What’s the damage?
- Compute component Delegation Risks — P(failure) × Damage for each
- Set system budget — What total risk is acceptable?
- Allocate budgets — Which components get how much?
- Choose implementations — Verified code > narrow models > frontier LLMs
- Apply principles — Go through each “Least X” for each component
- Design coordination — LLM or code? What redundancy?
- Identify human gates — Where do humans approve?
- Plan verification — How will you test trust assumptions?
- Monitor in production — Track actual vs. expected failure rates
Further Reading
Section titled “Further Reading”- Least X Principles — Full principle definitions
- Coordinator Constraints — Coordinator-specific principles
- Research Assistant Example — Full worked example
- Code Deployment Example — Higher-stakes example
- Delegation Risk Overview — Delegation Risk computation and propagation