Odd values for the 1st column of Q-table

Gleb_Novichkov_1gpH · January 7, 2022, 7:15pm

Course Name: Deep Reinforcement Learning in Trading, Section No: 4, Unit No: 7, Unit type: Quiz

Why is the 1st column (i.e. "Eat" action) contains 1? I think that if you use Bellman's equation to compute q-values for "Eat" action the values in the 1st column will be different.

Suleyman_Emre_Yesil_9LQXr · January 11, 2022, 3:43am

Hi Gleb,

In the question, the first column represents the child's reward for eating the sweet without waiting. Since eating the sweet without waiting gives a reward of 1 for every time step, the value in the first column is 1.

In brief, this is for just representing the logic that the child will get a reward of 1 in case of eating without waiting.

Hope this helps.

Gleb_Novichkov_1gpH · January 11, 2022, 10:30pm

Hi Suleyman,

Your explanations are about R-table; however in the explanation of the Section 4, unit 7 the table in question is Q-table. This means 1st column should be computed the same manner as the 2nd column, but it is left untouched for some reason. Which causes confusion.

Hope this explains my view point.

Kind regards,

Gleb

Ishan_Shah · January 12, 2022, 6:00am

Hi Gleb,

Thanks for your question.

The Q-table tells us the expected future reward for taking a particular action at a particular time step.

If for a particular time step and action combination, if the game continues then the reward for taking that action is unknown in R-table and marked as 0. Therefore we use the Bellman equation to estimate the perceived or expected reward for that time step and action combination and populate the Q-table. The bellman equation is not required if we already know the actual reward we will get by taking a particular action.

A Q-table is shown above. In this example, at time step 0, if we decide to take action "Wait" then the expected future reward is 1.06. We are not getting any reward here. It is the perceived reward. Therefore this needs to be estimated using the Bellman equation.

However, at time step 0, if we take the action "Eat", we will receive a sweet as an actual reward and the game will be over. Since we already know the actual reward for taking action to "Eat" the sweet, the bellman equation is not required.

Similarly, at time step 1, if we decide to eat, we will get an actual reward of 1 sweet. Therefore the values under "Eat" action are not changing.

I hope this helps.

Thanks

Ishan