-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Adjust locations of setting the policy in train/eval mode #1122
Labels
Comments
MischaPanch
added
bug
Something isn't working
refactoring
No change to functionality
labels
Apr 25, 2024
Max and I have implemented the following solution in #1123:
|
MischaPanch
added a commit
that referenced
this issue
May 6, 2024
Addresses #1122: * We Introduced a new flag `is_within_training_step` which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their `deterministic_eval` setting should indeed apply instead of the torch training flag (which was abused!). * The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed. * The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect. Further, it never makes sense to compute gradients during collection, so the possibility to pass `no_grad=False` was removed. Further changes: - Base class for collectors: `BaseCollector` - New util context managers `in_eval_mode` and `in_train_mode` for torch modules. - `reset` of `Collectors` now returns `obs` and `info`. - `no-grad` no longer accepted as kwarg of `collect` - Removed deprecations of `0.5.1` (will likely not affect anyone) and the unused `warnings` module.
@maxhuettenrauch @opcode81 This can be closed, right? |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Currently, tianshou sets the policy's mode in the trainer and
test_episode
function. The correspondingtraining
attribute is then used to determine if a stochastic policy should be evaluated deterministically given thatpolicy.deterministic_eval
isTrue
. This, however, is a misuse as thetraining
attribute primarily has influence on modules like dropout and batchnorm. It should always beFalse
during data collection and only beTrue
insidepolicy.learn
.The text was updated successfully, but these errors were encountered: