Skip to content

Example: AI Trading System

A worked example in a high-frequency, adversarial domain: an AI system that generates and executes trading strategies.

“Analyze market data, generate trading signals, and execute trades within risk limits”

Trading systems present unique challenges:

  • Adversarial environment — Other market participants actively exploit weaknesses
  • High frequency — Decisions in milliseconds, no time for human review per trade
  • Tail risk — Rare events can cause catastrophic losses (Flash Crash, LTCM)
  • Regulatory scrutiny — SEC, FINRA oversight; market manipulation concerns
  • Correlated failures — Many AI systems might fail the same way simultaneously

This requires careful balance between speed (for profit) and safety (for survival).

Failure modes and stakes:

FailureDamageNotes
Large unintended position$1M+Flash crash contribution
Model failure in volatility$500KWrong bets during crisis
Adversarial exploitation$200KOther traders gaming our signals
Regulatory violation$1M+Spoofing, manipulation fines
Data feed manipulation$100KActing on corrupted data
Runaway lossesUnlimitedWithout circuit breakers

Target Delegation Risk: $50,000/month (high tolerance, but with hard limits)

Hard limits (non-negotiable):

  • Max position size: $10M per instrument
  • Max daily loss: $500K (triggers shutdown)
  • Max drawdown: $2M (triggers review)
flowchart TB
    subgraph Data["Data Layer"]
        Feed[Market Data Feed<br/>verified code]
        Val[Data Validator<br/>verified code]
        Hist[Historical Data<br/>verified code]
    end
    subgraph Signals["Signal Generation"]
        Alpha[Alpha Models<br/>ML ensemble]
        Risk[Risk Models<br/>validated ML]
        Regime[Regime Detector<br/>statistical]
    end
    subgraph Portfolio["Portfolio Management"]
        Opt[Portfolio Optimizer<br/>code + constraints]
        Pos[Position Manager<br/>verified code]
        Limits[Limit Checker<br/>verified code]
    end
    subgraph Execution["Execution"]
        Exec[Execution Engine<br/>verified code]
        Circuit[Circuit Breakers<br/>verified code]
        Audit[Audit Logger<br/>verified code]
    end
    Feed --> Val
    Val --> Alpha
    Val --> Risk
    Val --> Regime
    Hist --> Alpha
    Hist --> Risk
    Alpha --> Opt
    Risk --> Opt
    Regime --> Opt
    Opt --> Pos
    Pos --> Limits
    Limits -->|approved| Exec
    Limits -->|rejected| Block[Blocked]
    Exec --> Circuit
    Circuit --> Market[Market]
    Exec --> Audit

Implementation: Verified code with statistical checks

class DataValidator:
def validate(self, tick: MarketTick) -> ValidatedTick | Rejection:
# Hard bounds
if tick.price <= 0 or tick.price > MAX_PRICE:
return Rejection("price_bounds")
# Staleness
if tick.timestamp < now() - MAX_STALENESS:
return Rejection("stale_data")
# Jump detection (vs recent history)
if abs(tick.price - self.last_price) / self.last_price > MAX_JUMP:
return Rejection("suspicious_jump", alert=True)
# Volume sanity
if tick.volume > self.avg_volume * VOLUME_MULTIPLE:
return Rejection("suspicious_volume", alert=True)
return ValidatedTick(tick, confidence=self.compute_confidence(tick))

Trust allocation: Delegation Risk $500 (verified code, critical path)

Why verified code: Data validation is the first line of defense. A compromised feed could cause unlimited losses. No ML here.

Implementation: Ensemble of narrow ML models

  • Multiple independent models (momentum, mean-reversion, sentiment, etc.)
  • Each model trained on specific patterns
  • Ensemble combines signals with confidence weighting
  • No single model can dominate

Trust allocation: Delegation Risk $15,000 (highest allocation—this is where we accept risk for return)

Constraints:

  • Models are narrow (trained on specific signal types)
  • Ensemble requires agreement threshold to act
  • Model outputs are signals (probabilities), not trade orders
  • Regular backtesting against recent data
  • Automatic confidence reduction during high volatility
class AlphaEnsemble:
def generate_signal(self, data: ValidatedData) -> Signal:
signals = [model.predict(data) for model in self.models]
# Require minimum agreement
agreeing = [s for s in signals if s.direction == majority_direction(signals)]
if len(agreeing) < self.min_agreement:
return Signal.NEUTRAL
# Confidence is reduced by disagreement
confidence = mean([s.confidence for s in agreeing])
confidence *= len(agreeing) / len(signals) # Disagreement penalty
return Signal(
direction=majority_direction(signals),
confidence=confidence,
contributing_models=[s.model_id for s in agreeing]
)

Implementation: Validated statistical models (not ML)

  • VaR and Expected Shortfall calculations
  • Correlation estimation (with regime adjustment)
  • Liquidity risk assessment
  • Concentration limits

Trust allocation: Delegation Risk $5,000 (validated models, conservative calibration)

Why statistical, not ML: Risk models must be interpretable and stable. ML risk models can underestimate tail risk exactly when it matters most.

Implementation: Statistical change-point detection

Detects market regime changes:

  • Low volatility → High volatility
  • Trending → Mean-reverting
  • Normal → Crisis

Trust allocation: Delegation Risk $2,000 (statistical model)

Effect on system:

  • Crisis regime → Reduce position sizes 50-90%
  • High volatility → Increase confidence thresholds
  • Regime uncertainty → Conservative mode

Implementation: Convex optimization with hard constraints

class PortfolioOptimizer:
def optimize(self, signals: List[Signal], risk: RiskEstimate,
positions: CurrentPositions) -> TargetPortfolio:
# Objective: maximize expected return - risk penalty
objective = sum(s.expected_return * w[s.instrument]
for s in signals) - self.risk_aversion * portfolio_var(w, risk)
# Hard constraints (cannot be violated)
constraints = [
sum(abs(w)) <= self.max_gross, # Gross exposure
w[i] <= self.max_position for all i, # Position limits
portfolio_var(w, risk) <= self.max_var, # VaR limit
sector_exposure(w) <= self.max_sector, # Concentration
]
# Solve with verified convex solver
result = cvxpy.solve(objective, constraints)
if not result.is_feasible:
return TargetPortfolio.NO_CHANGE
return TargetPortfolio(result.weights)

Trust allocation: Delegation Risk $3,000 (code with mathematical guarantees)

Key property: Constraints are hard-coded and mathematically guaranteed by the solver. The optimizer cannot propose a portfolio that violates limits.

Implementation: Verified code, independent from optimizer

Defense in depth—checks optimizer output before execution:

class LimitChecker:
def check(self, target: TargetPortfolio,
current: CurrentPositions) -> Approved | Rejected:
# Position limits
for instrument, target_size in target.positions.items():
if abs(target_size) > self.position_limits[instrument]:
return Rejected(f"position_limit_{instrument}")
# Daily trading limit
trade_value = sum(abs(target[i] - current[i]) * price[i]
for i in target.instruments)
if self.daily_traded + trade_value > self.daily_limit:
return Rejected("daily_trade_limit")
# Loss limit
if self.daily_pnl < -self.loss_limit:
return Rejected("loss_limit_breached")
# Concentration
if max_sector_exposure(target) > self.sector_limit:
return Rejected("sector_concentration")
return Approved(target)

Trust allocation: Delegation Risk $1,000 (verified code, independent check)

Why independent: Even if optimizer has a bug, limit checker catches violations. Different code, different team, different review process.

Implementation: Verified code with automatic shutdown

class CircuitBreaker:
def check_and_execute(self, order: Order) -> Executed | Blocked:
# Check breakers before every order
if self.daily_loss > self.loss_limit:
self.shutdown("daily_loss_limit")
return Blocked("circuit_breaker_loss")
if self.orders_per_second > self.rate_limit:
return Blocked("rate_limit")
if self.position_change_rate > self.velocity_limit:
return Blocked("velocity_limit")
# Execute
result = self.execute(order)
# Post-execution checks
if self.drawdown > self.drawdown_limit:
self.shutdown("drawdown_limit")
return result
def shutdown(self, reason: str):
"""Immediate shutdown - flatten all positions over 30 min"""
self.trading_enabled = False
self.alert_humans(reason)
self.initiate_flatten()

Trust allocation: Delegation Risk $500 (verified code, fail-safe)

Shutdown triggers:

  • Daily loss > $500K
  • Drawdown > $2M
  • Order rate anomaly
  • Data feed failure
  • Any component error
ComponentImplementationDelegation Risk% of Budget
Data ValidatorVerified code$5001%
Alpha ModelsML ensemble$15,00030%
Risk ModelsStatistical$5,00010%
Regime DetectorStatistical$2,0004%
Portfolio OptimizerVerified code + solver$3,0006%
Limit CheckerVerified code$1,0002%
Circuit BreakersVerified code$5001%
Execution EngineVerified code$1,0002%
Tail Risk BufferExtreme scenarios$15,00030%
Human OversightDaily review, alerts$6,50013%
Total$50,000100%

Key insight: 30% allocated to alpha models (where we take risk for return), 30% as tail risk buffer, remainder in verified infrastructure.

PrincipleApplication
Least IntelligenceAlpha models are narrow ensembles, not general reasoning. Risk models are statistical.
Least AutonomyHard limits enforced by verified code. Shutdown requires human to restart.
Least SurpriseCircuit breakers are deterministic. Optimizer is convex (unique solution).
Max VerifiabilityCritical path (limits, breakers, execution) is verified code.
Defense in DepthOptimizer constraints + independent limit checker + circuit breakers.

Models are trained on historical data. They can’t predict:

  • Flash crashes
  • Correlation breakdown (everything moves together)
  • Liquidity evaporation
  • Black swan events
  1. Regime detector reduces exposure when uncertainty rises
  2. Conservative risk estimates use stressed correlations
  3. Circuit breakers stop losses before catastrophic
  4. Position limits cap worst-case per-instrument loss
  5. Tail risk buffer in Delegation Risk budget assumes rare large losses
flowchart TD
    Normal[Normal Trading] --> L1{Daily Loss > $100K?}
    L1 -->|Yes| Reduce[Reduce Position Sizes 50%]
    L1 -->|No| Normal
    Reduce --> L2{Daily Loss > $300K?}
    L2 -->|Yes| Halt[Halt New Positions]
    L2 -->|No| Reduce
    Halt --> L3{Daily Loss > $500K?}
    L3 -->|Yes| Flatten[Flatten All Positions]
    L3 -->|No| Halt
    Flatten --> Human[Human Review Required]

Unlike other examples, this system uses no LLM in:

  • Data validation
  • Risk calculation
  • Portfolio optimization
  • Limit checking
  • Order execution

Why: Trading requires:

  • Millisecond latency (LLMs too slow)
  • Deterministic behavior (LLMs non-deterministic)
  • Mathematical guarantees (LLMs can’t provide)
  • Adversarial robustness (LLMs can be manipulated)

LLMs could be used for:

  • Overnight strategy review (not real-time)
  • Anomaly explanation (post-hoc)
  • Research hypothesis generation (offline)

But never in the execution path.

AspectResearch AssistantHealthcare TriageTrading System
SpeedHoursMinutesMilliseconds
StakesEffortLife safetyFinancial
AdversariesNoRareConstant
LLM roleCentralMinimalNone (real-time)
Human gateStrategyUncertain casesDaily review
Verified code %~30%~80%~95%
Hard limitsSoftMediumAbsolute