What is reinforcement learning trading?

Reinforcement learning trading uses an agent that learns from interaction with a market environment and a reward you define to choose trades over time.

How does Kabu use reinforcement learning for trading?

On Kabu, you configure an RL environment, reward, and algorithm, train agents in backtests, and then promote checkpoints to live trading deployments that act on real market data.

Does the agent keep learning in live trading?

No. Kabu trains reinforcement learning agents in backtests. When you go live, you deploy a fixed policy that does not train or update in production.

Reinforcement learning trading

Why we use reinforcement learning trading for sequential decisions, and how that fits what you do on Kabu.

Why reinforcement learning fits trading

Trading is a string of decisions with outcomes that play out over time—reinforcement learning trading is built for that.

You decide what to hold, when to enter or exit, how much size to take. Each choice affects what happens next. Reinforcement learning trading is about exactly that: an agent that chooses actions over time to maximise a reward you define. You don’t have to label “correct” trades; you set the reward (for example absolute return, risk-adjusted performance, or a downside constraint) and the agent learns from experience during training in a backtest.

In that training loop, the agent interacts with a simulated market environment, takes actions like buy, sell, hold, or adjust position size, and receives feedback after each step. Over many episodes it discovers—which sequences of trades, risk controls, and exits tend to lead to higher long-run reward under the constraints you care about. That is the core of reinforcement learning trading.

What is RL · Terms you’ll see

From backtest to live reinforcement learning trading

Same agent, same configuration—only the data source changes when you promote it to production.

In training, the agent runs against a backtest: your symbols, timeframe, reward function, and risk setup. When you’re happy with the result, you promote a checkpoint to a live deployment. The same policy runs on a schedule, but now on live market data and real orders through your broker. Kabu keeps the step from backtest to live consistent so you’re not rebuilding anything—same broker for data and execution, same reinforcement learning trading setup, just a different data feed.

Training on Kabu · Live trading

What you control in Kabu

You define the reinforcement learning trading setup; Kabu runs training and live execution.

You choose symbols, timeframe, data source (your broker), what the agent can do (for example trade mode and position sizes), how you score it (reward), and which algorithm to use. Today Kabu supports popular reinforcement learning algorithms like PPO, SAC, and DQN that have been used widely in trading research. The platform runs the training, stores runs and checkpoints, and lets you compare and promote a model to live. You focus on the strategy and the reward; we handle the infrastructure and execution.

How teams use Kabu · Features · Pricing