Reinforcement Learning

Course Name: Introduction to Machine Learning for Trading, Section No: 6, Unit No: 2, Unit type: Video



I am not quite familiar with RL, yet the first question that comes to mind is that whether the action is going to impact the environment in the remaining time or not (As a big sell order can scare the buyers). For instance, if the ratio of inventory volume (V) over the total transactions in time (H) is relatively high then (V,H) does not seem like enough parameters to define the state. Please tell me if I'm getting it wrong.

Could extrapolation of BOP indicator be an approach to solve this problem?

Hi Bahram,



To answer the first part of your query, the action is going to impact the environment in the remaining time, yes it will. But for backtesting purpose with a set resolution of data, the frequency of data you are working on determines the maximum frequency of the environment update. If you want to implement an RL agent which trains on 1-minute data, but you want to capture the impact of the trading action on the remaining time (until the next, then there has to be data available for a frequency less than a minute. The video explains the simple concept of action affecting the environment. You can define various elements to define the environment.



For the second part, the extrapolation of the BOP indicator may or may not be a good feature for the RL agent. It is up to the RL agent to learn from the various inputs. You can tweak the reward function (change the game rules) and the underlying neural network will learn from it's mistakes based on the reward function.



Hope this clarifies your doubt.

Do let me know if you need any help!



Thanks,

Gaurav