Reward design for RL trading
Reward design is the core lever in reinforcement learning trading. Learn how to avoid reward hacking and align rewards with risk and constraints.
Reward is the strategy
In RL trading, the reward function is how you encode what “good” behavior means.
Your agent will optimize what you specify—sometimes aggressively. A reward that only reflects raw PnL can create unintended behaviors: excessive turnover, risk concentration, or fragile strategies that fail outside the training regime.
Common failure mode: reward hacking
If there’s a loophole—an edge case in accounting, unrealistic execution assumptions, or a metric that can be gamed— an agent can find it. That’s why reward design and evaluation belong together.
Practical guidelines
- Include costs — turnover without costs is usually unrealistic.
- Penalize constraint violations — make risk limits explicit.
- Prefer simple shaping — add complexity only when it measurably improves out-of-sample stability.