Reinforcement Learning iterations

Hi,



I have a two part question;


  1. Generally in machine learning algorithms, I have noticed the model is trained for multiple iterations, usually known as epochs. But here in the reinforcement learning algorithm, we are training only for one iteration over the training data. We are considering each trade as an episode, but why haven't we considered an iteration or epoch as an episode?


  2. I tried running the model on Nifty 5m data from 2008-2020 but it only gave ~80 trades with a total profit of 68%, whereas your examples gave a very high number of trades. What could be the reason for that? 

Hi Mihir



For the second question, your model has possibly learnt the long term trend and hence is giving lesser trades. If you retrain the model, it might very well give 800 trades. The model specific learning is dependent on how you tune the model and what all parameters you use.



Now for the first question. Let us first clarify a few terms.



An iteration is nothing but a new data point or state that the environment gives the agent for which a decision is to be taken.



An episode is the lifecycle of a single trade. For example, for a long position, if the system is vacant for the first 5 days. On the 6th day, suppose buy is taken. Next, then the model holds the position for 10 days, till the 15th day. On the 16th day the model sells. this entire cycle from day 1 to day 16 is one episode.



An epoch is not an episode. An epoch in the model we are using is a random state (think of it as a data point or a row in a dataframe), from the past 'n' number of episodes. On every iteration, we select some number of states from each episode. This 'number' is called the batch size.

These states form the training dataset for the neural network. Therefore, on every iteration, a new training dataset is created and the neural network is trained on that. Based on the trained model, the network outputs a buy, sell or hold.



The 'n' number of episodes in the notebook is 600, and the batch size is 1. That is, from the previous 600 episodes, we are taking one random observation from each. On these 600 observations, we are training the neural net at an iteration to get an action (buy, sell or hold) that is to be performed.



As per your question, an iteration is the same as an epoch, which is not true. Therefore, I have attempted to explain the concept to you for what is happening. 



Hope this helps. Feel free to reach out for any clarification.