Reinforcement Learning

Hi,

The RL course has seen been improved. Did you guys change some nb files? I redid it from scratch - and it worked. I believe, you had put some print statements, so now we can follow what's happening inisde the Training.



Interstingly, the first time when training, the algo started to converge better and better and it went to glorious 48% profits. It was relatively straight forward advancment of training: 3 forward 2 back. 4 forward 3 back. So, i got excited. I could follow it, and, it was profitable and learning. I was expecting to go upward from there.



Subsequently, however, each later trial was worst and finished with bad(ly) results (-80% profit). 

Why the system did not learn? How can it retain good memory and discard bad? Why it started so well and finished abismally bad? Was the first batch pure luck or replication of your memory model and then i "poluted" it?



Also, you sugested to limit the training, save the results, but its not clear where and how?

Few lines of adressed code will help. Cheers, A. 

Hi Antoni,



As you know, reinforcement learning can give different results. I want to ask, when you mean "trials", do you mean that you are training the model on the entire data, or is it part of one training only?



If you feel that the model is not giving good returns, you can train multiple agents on the same data. And then combine together and then allocate the capital to the agents based on their returns. You would know that this is the logic for the capstone project mentioned in section 22 unit 6. 



With respect to saving the results, you can see in the "quantra_reinforcement_learning.py" file, that we print the following details after a trade has been taken.



print("Trade {:03d} | pos {} | len {} | approx cum ret {:,.2f}% | trade ret {:,.2f}% | eps {:,.4f} | {} | {}".format(

            episode, env.position, env.trade_len, sum(pnls)100, env.pnl100, epsilon, env.curr_time, env.curr_idx))



You can save this information to an empty list and then append to it after every trade. And at the end of the day, save this information to a pickle file.



import pickle

Create an empty list to hold trade experience

trade_experience =


As trades are taken, append the details to the trade_experience list

Create temporary dataframe to store the trade information in the dataframe called trade


_______________ 

# Append the trade details to trade_experience list
trade_experience.append(trade)

# At the end of the day or when you want to save the trade experience, save it to a pickle file with open('trade_experience.pickle', 'wb') as f: pickle.dump(trade_experience, f)


Of course, you will have to modify this and this is not the solution, but you can work on these lines.