What is reinforcement learning?

What we mean by it on Kabu, in plain terms.

In a nutshell

Reinforcement learning is a way for a program—an “agent”—to get better at a task by trying things, seeing what works, and gradually favouring actions that lead to better outcomes.

There’s no list of “right” answers. The agent interacts with an environment (in our case, the market or a backtest of it), takes actions (e.g. buy, sell, hold or position sizes), and gets feedback—a reward—that you define (e.g. profit, risk-adjusted return). During training, over many steps, it figures out which behaviour pays off. That’s the same idea behind the agents you train and deploy on Kabu.

Where it comes from · Terms you’ll see

Agent, environment, reward

The three pieces that matter when you use Kabu.

Agent — The thing that makes decisions. On Kabu it’s the strategy you train: you choose an algorithm (we support a few, like PPO, SAC, DQN), configure it, and the platform trains it. The result runs in backtest while you train and, once you deploy, in live.
Environment — What the agent interacts with. Here that’s the market: prices, your positions, time. In training we use a simulated environment (backtest); when you go live, it’s real market data.
Reward — The signal that tells the agent whether an outcome was good or bad. You define it (e.g. PnL, Sharpe-style metrics). The agent’s job is to maximise that over time.

Why it fits what we do

Trading is a sequence of decisions with delayed outcomes—RL is built for that.

You don’t need to label “correct” moves. You define the reward and the setting (symbols, timeframe, data); the agent learns from experience during training. (Live is for running the trained agent—no learning there.) That’s why we use RL for Kabu and why you’ll see the words “agent”, “environment”, and “reward” around the product.

How we use RL in trading · Training on Kabu