Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adjust locations of setting the policy in train/eval mode #1123

Merged
merged 35 commits into from
May 6, 2024

Conversation

maxhuettenrauch
Copy link
Collaborator

@maxhuettenrauch maxhuettenrauch commented Apr 24, 2024

Addresses #1122:

  • We Introduced a new flag is_within_training_step which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their deterministic_eval setting should indeed apply instead of the torch training flag (which was abused!).
  • The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed.
  • The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect. Further, it never makes sense to compute gradients during collection, so the possibility to pass no_grad=False was removed.

Further changes:

  • Base class for collectors: BaseCollector
  • New util context managers in_eval_mode and in_train_mode for torch modules.
  • reset of Collectors now returns obs and info.
  • no-grad no longer accepted as kwarg of collect
  • Removed deprecations of 0.5.1 (will likely not affect anyone) and the unused warnings module.

@MischaPanch
Copy link
Collaborator

I'll add the context manager for policies (maybe torch already has one), as discussed, and then this is good to go. Thanks for the extension @maxhuettenrauch !

@MischaPanch MischaPanch self-requested a review April 25, 2024 16:10
Michael Panchenko added 5 commits April 26, 2024 14:42
There's better ways to deal with deprecations that we shall use in the future
A test is not a script and should not be used as such

Also marked pistonball test as skipped since it doesn't actually test anything
@MischaPanch MischaPanch marked this pull request as ready for review April 26, 2024 15:40
@MischaPanch MischaPanch requested a review from opcode81 April 26, 2024 16:25
MischaPanch
MischaPanch previously approved these changes Apr 26, 2024
@MischaPanch
Copy link
Collaborator

MischaPanch commented Apr 26, 2024

@ChenDRAG Once finalized (there's still a bit of review and discussion to be done), I'd merge this PR without squashing if you don't mind. It contains somehow unrelated changes and the history would be easier to bisect on commit level if we don't squash

tianshou/policy/base.py Outdated Show resolved Hide resolved
tianshou/policy/base.py Outdated Show resolved Hide resolved
@opcode81
Copy link
Collaborator

opcode81 commented May 2, 2024

Notebooks and documentation have not been updated and still use the old way of setting train/eval mode.

tianshou/data/collector.py Outdated Show resolved Hide resolved
…mplementation,

as this is less prone to errors
opcode81 added 2 commits May 3, 2024 10:12
  New method training_step, which
    * collects training data (method _collect_training_data)
    * performs "test in train" (method _test_in_train)
    * performs policy update
  The old method named train_step performed only the first two points
  and was now split into two separate methods
@opcode81 opcode81 force-pushed the policy-train-eval branch from 5ba10d3 to 3184f73 Compare May 3, 2024 08:12
tianshou/policy/base.py Outdated Show resolved Hide resolved
opcode81 added 2 commits May 3, 2024 15:18
  * Remove flag `eval_mode` from Collector.collect
  * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages)
    and set it appropriately in BaseTrainer
Michael Panchenko and others added 11 commits May 5, 2024 15:14
  * Add class ExperimentCollection to improve usability
  * Remove parameters from ExperimentBuilder.build
  * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
    changing the return type to ExperimentCollection
  * Replace temp_config_mutation (which was not appropriate for the public API) with
    method copy (which performs a safe deep copy)
@MischaPanch MischaPanch force-pushed the policy-train-eval branch from e50c67c to d8e5631 Compare May 5, 2024 20:29
Michael Panchenko added 4 commits May 5, 2024 22:33
For some reason now env.spec.reward_treshold is None - some change in upstream code

Also added better pytest skip message
Adjusted notebooks, log messages and docs accordingly. Removed now
obsolete in_eval_mode and the private context manager in Trainer
@MischaPanch MischaPanch force-pushed the policy-train-eval branch from 2685dca to e94a5c0 Compare May 6, 2024 17:23
@MischaPanch
Copy link
Collaborator

This is done, will merge without squashing when tests have finished

@MischaPanch MischaPanch merged commit 26b867e into thu-ml:master May 6, 2024
4 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants