-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
question about paper result #35
Comments
Hi, The main reason is that during training the exploration would be considered, e.g., some suboptimal random actions would be conducted. However, during testing the optimal actions are conducted, so the reward would be naturally higher. Hope this can help you! |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Dear author:
Hello! I am a graduate student in a Chinese university. I am working on a project on multi-agent reinforcement learning. I hope to connect my algorithm to the environment you developed to test the effect. However, when I reproduced the results of your paper, I found something confusing:
These two pictures are the scenes corresponding to L1-33. The first picture is the information recorded by eval(), and the second picture is the information recorded by train(). It can be seen that there is a huge difference in the values between the two. Why is this?
The text was updated successfully, but these errors were encountered: