Reinforcement learning trading
Why we use reinforcement learning trading for sequential decisions, and how that fits what you do on Kabu.
Why reinforcement learning fits trading
Trading is a string of decisions with outcomes that play out over time—reinforcement learning trading is built for that.
You decide what to hold, when to enter or exit, how much size to take. Each choice affects what happens next. Reinforcement learning trading is about exactly that: an agent that chooses actions over time to maximise a reward you define. You don’t have to label “correct” trades; you set the reward (for example absolute return, risk-adjusted performance, or a downside constraint) and the agent learns from experience during training in a backtest.
In that training loop, the agent interacts with a simulated market environment, takes actions like buy, sell, hold, or adjust position size, and receives feedback after each step. Over many episodes it discovers—which sequences of trades, risk controls, and exits tend to lead to higher long-run reward under the constraints you care about. That is the core of reinforcement learning trading.
From backtest to live reinforcement learning trading
Same agent, same configuration—only the data source changes when you promote it to production.
In training, the agent runs against a backtest: your symbols, timeframe, reward function, and risk setup. When you’re happy with the result, you promote a checkpoint to a live deployment. The same policy runs on a schedule, but now on live market data and real orders through your broker. Kabu keeps the step from backtest to live consistent so you’re not rebuilding anything—same broker for data and execution, same reinforcement learning trading setup, just a different data feed.
What you control in Kabu
You define the reinforcement learning trading setup; Kabu runs training and live execution.
You choose symbols, timeframe, data source (your broker), what the agent can do (for example trade mode and position sizes), how you score it (reward), and which algorithm to use. Today Kabu supports popular reinforcement learning algorithms like PPO, SAC, and DQN that have been used widely in trading research. The platform runs the training, stores runs and checkpoints, and lets you compare and promote a model to live. You focus on the strategy and the reward; we handle the infrastructure and execution.