Example: AI Trading System
Example: AI Trading System
Section titled “Example: AI Trading System”A worked example in a high-frequency, adversarial domain: an AI system that generates and executes trading strategies.
“Analyze market data, generate trading signals, and execute trades within risk limits”
Why This Domain
Section titled “Why This Domain”Trading systems present unique challenges:
- Adversarial environment — Other market participants actively exploit weaknesses
- High frequency — Decisions in milliseconds, no time for human review per trade
- Tail risk — Rare events can cause catastrophic losses (Flash Crash, LTCM)
- Regulatory scrutiny — SEC, FINRA oversight; market manipulation concerns
- Correlated failures — Many AI systems might fail the same way simultaneously
This requires careful balance between speed (for profit) and safety (for survival).
Risk Tolerance
Section titled “Risk Tolerance”Failure modes and stakes:
| Failure | Damage | Notes |
|---|---|---|
| Large unintended position | $1M+ | Flash crash contribution |
| Model failure in volatility | $500K | Wrong bets during crisis |
| Adversarial exploitation | $200K | Other traders gaming our signals |
| Regulatory violation | $1M+ | Spoofing, manipulation fines |
| Data feed manipulation | $100K | Acting on corrupted data |
| Runaway losses | Unlimited | Without circuit breakers |
Target Delegation Risk: $50,000/month (high tolerance, but with hard limits)
Hard limits (non-negotiable):
- Max position size: $10M per instrument
- Max daily loss: $500K (triggers shutdown)
- Max drawdown: $2M (triggers review)
Component Architecture
Section titled “Component Architecture”flowchart TB
subgraph Data["Data Layer"]
Feed[Market Data Feed<br/>verified code]
Val[Data Validator<br/>verified code]
Hist[Historical Data<br/>verified code]
end
subgraph Signals["Signal Generation"]
Alpha[Alpha Models<br/>ML ensemble]
Risk[Risk Models<br/>validated ML]
Regime[Regime Detector<br/>statistical]
end
subgraph Portfolio["Portfolio Management"]
Opt[Portfolio Optimizer<br/>code + constraints]
Pos[Position Manager<br/>verified code]
Limits[Limit Checker<br/>verified code]
end
subgraph Execution["Execution"]
Exec[Execution Engine<br/>verified code]
Circuit[Circuit Breakers<br/>verified code]
Audit[Audit Logger<br/>verified code]
end
Feed --> Val
Val --> Alpha
Val --> Risk
Val --> Regime
Hist --> Alpha
Hist --> Risk
Alpha --> Opt
Risk --> Opt
Regime --> Opt
Opt --> Pos
Pos --> Limits
Limits -->|approved| Exec
Limits -->|rejected| Block[Blocked]
Exec --> Circuit
Circuit --> Market[Market]
Exec --> Audit
Component Design
Section titled “Component Design”Data Validator
Section titled “Data Validator”Implementation: Verified code with statistical checks
class DataValidator: def validate(self, tick: MarketTick) -> ValidatedTick | Rejection: # Hard bounds if tick.price <= 0 or tick.price > MAX_PRICE: return Rejection("price_bounds")
# Staleness if tick.timestamp < now() - MAX_STALENESS: return Rejection("stale_data")
# Jump detection (vs recent history) if abs(tick.price - self.last_price) / self.last_price > MAX_JUMP: return Rejection("suspicious_jump", alert=True)
# Volume sanity if tick.volume > self.avg_volume * VOLUME_MULTIPLE: return Rejection("suspicious_volume", alert=True)
return ValidatedTick(tick, confidence=self.compute_confidence(tick))Trust allocation: Delegation Risk $500 (verified code, critical path)
Why verified code: Data validation is the first line of defense. A compromised feed could cause unlimited losses. No ML here.
Alpha Models (Signal Generation)
Section titled “Alpha Models (Signal Generation)”Implementation: Ensemble of narrow ML models
- Multiple independent models (momentum, mean-reversion, sentiment, etc.)
- Each model trained on specific patterns
- Ensemble combines signals with confidence weighting
- No single model can dominate
Trust allocation: Delegation Risk $15,000 (highest allocation—this is where we accept risk for return)
Constraints:
- Models are narrow (trained on specific signal types)
- Ensemble requires agreement threshold to act
- Model outputs are signals (probabilities), not trade orders
- Regular backtesting against recent data
- Automatic confidence reduction during high volatility
class AlphaEnsemble: def generate_signal(self, data: ValidatedData) -> Signal: signals = [model.predict(data) for model in self.models]
# Require minimum agreement agreeing = [s for s in signals if s.direction == majority_direction(signals)] if len(agreeing) < self.min_agreement: return Signal.NEUTRAL
# Confidence is reduced by disagreement confidence = mean([s.confidence for s in agreeing]) confidence *= len(agreeing) / len(signals) # Disagreement penalty
return Signal( direction=majority_direction(signals), confidence=confidence, contributing_models=[s.model_id for s in agreeing] )Risk Models
Section titled “Risk Models”Implementation: Validated statistical models (not ML)
- VaR and Expected Shortfall calculations
- Correlation estimation (with regime adjustment)
- Liquidity risk assessment
- Concentration limits
Trust allocation: Delegation Risk $5,000 (validated models, conservative calibration)
Why statistical, not ML: Risk models must be interpretable and stable. ML risk models can underestimate tail risk exactly when it matters most.
Regime Detector
Section titled “Regime Detector”Implementation: Statistical change-point detection
Detects market regime changes:
- Low volatility → High volatility
- Trending → Mean-reverting
- Normal → Crisis
Trust allocation: Delegation Risk $2,000 (statistical model)
Effect on system:
- Crisis regime → Reduce position sizes 50-90%
- High volatility → Increase confidence thresholds
- Regime uncertainty → Conservative mode
Portfolio Optimizer
Section titled “Portfolio Optimizer”Implementation: Convex optimization with hard constraints
class PortfolioOptimizer: def optimize(self, signals: List[Signal], risk: RiskEstimate, positions: CurrentPositions) -> TargetPortfolio:
# Objective: maximize expected return - risk penalty objective = sum(s.expected_return * w[s.instrument] for s in signals) - self.risk_aversion * portfolio_var(w, risk)
# Hard constraints (cannot be violated) constraints = [ sum(abs(w)) <= self.max_gross, # Gross exposure w[i] <= self.max_position for all i, # Position limits portfolio_var(w, risk) <= self.max_var, # VaR limit sector_exposure(w) <= self.max_sector, # Concentration ]
# Solve with verified convex solver result = cvxpy.solve(objective, constraints)
if not result.is_feasible: return TargetPortfolio.NO_CHANGE
return TargetPortfolio(result.weights)Trust allocation: Delegation Risk $3,000 (code with mathematical guarantees)
Key property: Constraints are hard-coded and mathematically guaranteed by the solver. The optimizer cannot propose a portfolio that violates limits.
Limit Checker
Section titled “Limit Checker”Implementation: Verified code, independent from optimizer
Defense in depth—checks optimizer output before execution:
class LimitChecker: def check(self, target: TargetPortfolio, current: CurrentPositions) -> Approved | Rejected:
# Position limits for instrument, target_size in target.positions.items(): if abs(target_size) > self.position_limits[instrument]: return Rejected(f"position_limit_{instrument}")
# Daily trading limit trade_value = sum(abs(target[i] - current[i]) * price[i] for i in target.instruments) if self.daily_traded + trade_value > self.daily_limit: return Rejected("daily_trade_limit")
# Loss limit if self.daily_pnl < -self.loss_limit: return Rejected("loss_limit_breached")
# Concentration if max_sector_exposure(target) > self.sector_limit: return Rejected("sector_concentration")
return Approved(target)Trust allocation: Delegation Risk $1,000 (verified code, independent check)
Why independent: Even if optimizer has a bug, limit checker catches violations. Different code, different team, different review process.
Circuit Breakers
Section titled “Circuit Breakers”Implementation: Verified code with automatic shutdown
class CircuitBreaker: def check_and_execute(self, order: Order) -> Executed | Blocked: # Check breakers before every order if self.daily_loss > self.loss_limit: self.shutdown("daily_loss_limit") return Blocked("circuit_breaker_loss")
if self.orders_per_second > self.rate_limit: return Blocked("rate_limit")
if self.position_change_rate > self.velocity_limit: return Blocked("velocity_limit")
# Execute result = self.execute(order)
# Post-execution checks if self.drawdown > self.drawdown_limit: self.shutdown("drawdown_limit")
return result
def shutdown(self, reason: str): """Immediate shutdown - flatten all positions over 30 min""" self.trading_enabled = False self.alert_humans(reason) self.initiate_flatten()Trust allocation: Delegation Risk $500 (verified code, fail-safe)
Shutdown triggers:
- Daily loss > $500K
- Drawdown > $2M
- Order rate anomaly
- Data feed failure
- Any component error
Delegation Risk Budget Summary
Section titled “Delegation Risk Budget Summary”| Component | Implementation | Delegation Risk | % of Budget |
|---|---|---|---|
| Data Validator | Verified code | $500 | 1% |
| Alpha Models | ML ensemble | $15,000 | 30% |
| Risk Models | Statistical | $5,000 | 10% |
| Regime Detector | Statistical | $2,000 | 4% |
| Portfolio Optimizer | Verified code + solver | $3,000 | 6% |
| Limit Checker | Verified code | $1,000 | 2% |
| Circuit Breakers | Verified code | $500 | 1% |
| Execution Engine | Verified code | $1,000 | 2% |
| Tail Risk Buffer | Extreme scenarios | $15,000 | 30% |
| Human Oversight | Daily review, alerts | $6,500 | 13% |
| Total | $50,000 | 100% |
Key insight: 30% allocated to alpha models (where we take risk for return), 30% as tail risk buffer, remainder in verified infrastructure.
Principles Applied
Section titled “Principles Applied”| Principle | Application |
|---|---|
| Least Intelligence | Alpha models are narrow ensembles, not general reasoning. Risk models are statistical. |
| Least Autonomy | Hard limits enforced by verified code. Shutdown requires human to restart. |
| Least Surprise | Circuit breakers are deterministic. Optimizer is convex (unique solution). |
| Max Verifiability | Critical path (limits, breakers, execution) is verified code. |
| Defense in Depth | Optimizer constraints + independent limit checker + circuit breakers. |
Handling Tail Risk
Section titled “Handling Tail Risk”What the Models Can’t See
Section titled “What the Models Can’t See”Models are trained on historical data. They can’t predict:
- Flash crashes
- Correlation breakdown (everything moves together)
- Liquidity evaporation
- Black swan events
Mitigation Strategy
Section titled “Mitigation Strategy”- Regime detector reduces exposure when uncertainty rises
- Conservative risk estimates use stressed correlations
- Circuit breakers stop losses before catastrophic
- Position limits cap worst-case per-instrument loss
- Tail risk buffer in Delegation Risk budget assumes rare large losses
Circuit Breaker Cascade
Section titled “Circuit Breaker Cascade”flowchart TD
Normal[Normal Trading] --> L1{Daily Loss > $100K?}
L1 -->|Yes| Reduce[Reduce Position Sizes 50%]
L1 -->|No| Normal
Reduce --> L2{Daily Loss > $300K?}
L2 -->|Yes| Halt[Halt New Positions]
L2 -->|No| Reduce
Halt --> L3{Daily Loss > $500K?}
L3 -->|Yes| Flatten[Flatten All Positions]
L3 -->|No| Halt
Flatten --> Human[Human Review Required]
No LLM in Critical Path
Section titled “No LLM in Critical Path”Unlike other examples, this system uses no LLM in:
- Data validation
- Risk calculation
- Portfolio optimization
- Limit checking
- Order execution
Why: Trading requires:
- Millisecond latency (LLMs too slow)
- Deterministic behavior (LLMs non-deterministic)
- Mathematical guarantees (LLMs can’t provide)
- Adversarial robustness (LLMs can be manipulated)
LLMs could be used for:
- Overnight strategy review (not real-time)
- Anomaly explanation (post-hoc)
- Research hypothesis generation (offline)
But never in the execution path.
Comparison to Other Examples
Section titled “Comparison to Other Examples”| Aspect | Research Assistant | Healthcare Triage | Trading System |
|---|---|---|---|
| Speed | Hours | Minutes | Milliseconds |
| Stakes | Effort | Life safety | Financial |
| Adversaries | No | Rare | Constant |
| LLM role | Central | Minimal | None (real-time) |
| Human gate | Strategy | Uncertain cases | Daily review |
| Verified code % | ~30% | ~80% | ~95% |
| Hard limits | Soft | Medium | Absolute |
What’s Next?
Section titled “What’s Next?”- Healthcare Triage Example — Life-safety domain
- Risk Budgeting Overview — Euler allocation for component budgets
- Decision Guide — When to use ML vs verified code