i want my deep RL agent to learn scalping so i designed a reward function to punish if it stays in the trade for more than a specific time. But it does not seem to learn it and stays in trade for a longer period of time. This is the reward function i created
def combined_reward(entry, curr, pos, trade_len):
# Calculate PnL
pnl = get_pnl(entry, curr, pos)
# Exponential PnL reward
exp_pnl_reward = np.exp(pnl)
max_steps = 24
# Scalping penalty
if trade_len > max_steps:
penalty = -(trade_len - max_steps)
else:
penalty = 0
# Combine
reward = exp_pnl_reward + penalty
return reward
Please advise me as to what to do.
Hello Ravitheja,
One reason for the RL agent not learning could be the insignificant penalty. You can scale the penalty by multiplying it with a scaling factor, say 0.5, to make it more significant and influential in the total reward (e.g., penalty = -0.5 * (trade_len - max_steps)). It may require multiple iterations and experimentation with the scaling factor to achieve the desired behaviour.
Also, print and monitor the values of key variables (PnL, trade length, etc.) during training to gain insights into the agent's behaviour. This can help you identify whether the issue lies with the reward function or other aspects of the RL setup.
Hope this helps!