How to evaluate RL trading strategies

A practical checklist for reinforcement learning trading evaluation: leakage prevention, walk-forward tests, risk metrics, and monitoring assumptions.

Start with leakage and data hygiene

Most impressive backtests fail here.

Feature leakage — any feature derived from future candles or revised values invalidates the test.
Corporate actions & adjustments — know what “price” means in your data source.
Execution assumptions — slippage, fees, and fill logic should match what you can achieve live.

Use walk-forward, not one backtest

RL can overfit regimes; you need time-separated evaluation.

Run rolling windows: train on a past segment, validate on the next segment, then move forward. Track whether the strategy is stable across regimes, not just high-performing in a single period.

Measure what you actually care about

Return and drawdown — not just mean return.
Risk-adjusted metrics — volatility- and tail-aware measures.
Turnover and costs — how sensitive is performance to transaction costs?
Constraint violations — leverage, max positions, or exposure limits.

Design for monitoring

Before you go live, decide what “degrading” looks like: performance drift, action distribution drift, exposure drift, or increased constraint hits. Monitoring is part of the system, not an afterthought.

Next: research to production