How to evaluate RL trading strategies
A practical checklist for reinforcement learning trading evaluation: leakage prevention, walk-forward tests, risk metrics, and monitoring assumptions.
Start with leakage and data hygiene
Most impressive backtests fail here.
- Feature leakage — any feature derived from future candles or revised values invalidates the test.
- Corporate actions & adjustments — know what “price” means in your data source.
- Execution assumptions — slippage, fees, and fill logic should match what you can achieve live.
Use walk-forward, not one backtest
RL can overfit regimes; you need time-separated evaluation.
Run rolling windows: train on a past segment, validate on the next segment, then move forward. Track whether the strategy is stable across regimes, not just high-performing in a single period.
Measure what you actually care about
- Return and drawdown — not just mean return.
- Risk-adjusted metrics — volatility- and tail-aware measures.
- Turnover and costs — how sensitive is performance to transaction costs?
- Constraint violations — leverage, max positions, or exposure limits.
Design for monitoring
Before you go live, decide what “degrading” looks like: performance drift, action distribution drift, exposure drift, or increased constraint hits. Monitoring is part of the system, not an afterthought.