Reinforcement Learning run(bars5m, rl_config) winds up with a long list of '1/1[====]'

William_Porter_5ovCN · May 12, 2023, 10:38pm

I am using Jupyter Lab for RL and there is a long list of [=======] that I would like to remove and only wind up with the trade list such as:

'Trade 004 | pos 1 | len 81 | approx cum ret 0.87% | trade ret 0.36% | eps 0.0635 | 2010-03-03 10:55:00-05:00 | 3136 '

With the long list of  ' [=]' it makes I very difficult to debug the code.

How can I wind up with only the trades and not the list of ' [=]'?

William P

Rekhit_Pachanekar · May 15, 2023, 1:43pm

Hello William,

I hope I do not sound redundant here, but the method of removing them by by setting the verbose parameter to False when calling the predict method, did not work?

For example,

From code,

targets[i] = modelR.predict(state_t)[0]

Q_sa = np.max(modelQ.predict(state_tp1)[0])

to:

targets[i] = modelR.predict(state_t, verbose=False)[0]

Q_sa = np.max(modelQ.predict(state_tp1, verbose=False)[0])

should have helped you. If it doesn't, can you help us with a screenshot of the output and we can take it forward from there.

Thanks.

William_Porter_5ovCN · May 25, 2023, 4:38am

The routine attached is still generating many of the following 1/1 [==============================] - 0s 14ms/step. It does not work!

The code snip is contained in the Experience Replay under

class ExperienceReplay(object):

    def process(self, modelQ, modelR, batch_size=10):

        for i, idx in enumerate(np.random.randint(0, len_memory, size=inputs.shape[0])):

            """Obtain the parameters for Bellman from memory,

            S.A.R.S: state, action, reward, new state."""

            state_t, action_t, reward_t, state_tp1 = self.memory[idx][0]

            game_over = self.memory[idx][1]

            inputs[i] = state_t

            """—Calculate the targets for the state at time t—"""

            targets[i] = modelR.predict(state_t,verbose=False)[0]

            """—Calculate the reward at time t+1 for action at time t—"""

            Q_sa = np.max(modelQ.predict(state_tp1,verbose=False)[0])

Do you have another method for resolving this issue?

William

Akshay_Choudhary · May 25, 2023, 7:43am

Hi William,

You can set the verbose to "0" (verbose=0) and run the code. This should run the code in a silent fashion without the progress bar.

Hope this helps!

Thanks,

Akshay