Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

QLearningAgent learns incorrect results #1247

Open
CrosleyZack opened this issue Jan 20, 2022 · 0 comments
Open

QLearningAgent learns incorrect results #1247

CrosleyZack opened this issue Jan 20, 2022 · 0 comments

Comments

@CrosleyZack
Copy link

I believe there is a flaw in the QLearningAgent implementation in reinforcement.py, possibly resulting from how run_single_trial is written.

I was testing this with the 4x3 environment problem given in 17.1. Upon reaching a terminal state (TERMINAL?(s1) == True), the __call__ function returns None. This causes run_single_trial to exit. If called again in a loop for multiple trials (IE for _ in range(N): run_single_trial(agent_program, mdp)), this results in a call to QLearningAgent.__call__ with s1 being the initial state [(1,1) for 4x3 environment], r1 being the reward for this state (-0.04 for 4x3 environment), TERMINAL?(s) == TRUE [as s is either (4,2) or (4,3)], and a == None. This then sets Q[s, None] = r1 = -0.04, instead of the actual termination value of 1 or -1. This results in an incorrect policy. Simply change line 93 to Q[s, None] = r fixes the issue and learns a correct policy.

I recognize this does not match the pseudocode in the book (21.8), and I am not certain if this is simply due to the implementation of run_single_trial. A better fix may be available which more closely matches the pseudocode from 21.8.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant