Scalping with my Deep RL agent

i want my deep RL agent to learn scalping so i designed a reward function to punish if it stays in the trade for more than a specific time. But it does not seem to learn it and stays in trade for a longer period of time. This is the reward function i created



def combined_reward(entry, curr, pos, trade_len):



    # Calculate PnL 

    pnl = get_pnl(entry, curr, pos)

    

    # Exponential PnL reward

    exp_pnl_reward = np.exp(pnl)



    max_steps = 24

    # Scalping penalty

    if trade_len > max_steps:

        penalty = -(trade_len - max_steps)

    else:

        penalty = 0



    # Combine 

    reward = exp_pnl_reward + penalty

   

    return reward



Please advise me as to what to do.

Hello Ravitheja,



One reason for the RL agent not learning could be the insignificant penalty. You can scale the penalty by multiplying it with a scaling factor, say 0.5, to make it more significant and influential in the total reward (e.g., penalty = -0.5 * (trade_len - max_steps)). It may require multiple iterations and experimentation with the scaling factor to achieve the desired behaviour.



Also, print and monitor the values of key variables (PnL, trade length, etc.) during training to gain insights into the agent's behaviour. This can help you identify whether the issue lies with the reward function or other aspects of the RL setup.



Hope this helps!