Scalping with my Deep RL agent

Ravitheja_thaduri_FR9tr · December 4, 2023, 2:15pm

i want my deep RL agent to learn scalping so i designed a reward function to punish if it stays in the trade for more than a specific time. But it does not seem to learn it and stays in trade for a longer period of time. This is the reward function i created

def combined_reward(entry, curr, pos, trade_len):

# Calculate PnL

pnl = get_pnl(entry, curr, pos)

# Exponential PnL reward

exp_pnl_reward = np.exp(pnl)

max_steps = 24

# Scalping penalty

if trade_len > max_steps:

penalty = -(trade_len - max_steps)

else:

penalty = 0

# Combine

reward = exp_pnl_reward + penalty

return reward

Please advise me as to what to do.

varun_kumar_pothula · December 6, 2023, 8:20am

Hello Ravitheja,

One reason for the RL agent not learning could be the insignificant penalty. You can scale the penalty by multiplying it with a scaling factor, say 0.5, to make it more significant and influential in the total reward (e.g., penalty = -0.5 * (trade_len - max_steps)). It may require multiple iterations and experimentation with the scaling factor to achieve the desired behaviour.

Also, print and monitor the values of key variables (PnL, trade length, etc.) during training to gain insights into the agent's behaviour. This can help you identify whether the issue lies with the reward function or other aspects of the RL setup.

Hope this helps!