Deep Reinforcement Learning for Trading

Hello,



I have a couple of questions about somme details of the python code:


  1. In the "update_position" function

    What is  self.entry = self.curr_price? Does it meat next open price or the close of the set up day?


  2. In the "run" function
  • How is calculated the trade or more pricely what are the entry data and the exit data of the trade?
  • In the backtest, are the data used only for training or after the 'START_IDX': 2500 parameter, the data are considered like new?
  • If I want to train on historical data and after test the model on new data, what should be the process?



    Regards

    Laurent

     

Hi Laurent,



1.) This equation is active if there is no position and there is an action for buying or selling. To keep logs of the trades and to calculate the return, we need the entry price. You can see self.curr_price under the "act" function as:

self.curr_price = self.bars5m['close'][self.curr_idx]

So it means our entry price in case of trade will be the close price of the current time.



2.) 

-To understand the entry data for trade, you should check the "get_state" and "_assemble_state" functions within the Game class. These functions return (by calling "_assemble_state" within "get_state")  inputs such as candlestick bars, indicators, time signatures used as input for prediction. 

In the second(nested) while loop in the "run" function there is the following code lines:

q = q_network.predict(state_t)
action = np.argmax(q[0])

  So we can summarise that based on the Q table we created, a prediction is made by using the inputs(state_t) and this prediction is stored in action as 1, -1 or 0. By calling the "act" function with this action, a trade is initialised.

-We can assume both. Since the model tries to optimise itself based on reward and errors, every time new data is delivered, it trains itself for optimisation. 

-First of all, you can change the technical indicators or other inputs used in the model for the prediction. To do that, you need to calculate new indicators in the "_assemble_state" function of the Game class and make modifications accordingly. 

-After you go through the learning process, you can stop generating a new q table and update the nn model. In the code, to optimise the results, a new q table is created based on the prediction accuracy or reward and error. Once you have satisfying results, you can remove that part and use the most recent nn model for predictions. 



Hope this helps.

Best Regards 

Thanks Suleyman