New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

[WIP] AveragingEpisodesController #89

Open

maotto wants to merge 3 commits into rock-learning:master from maotto:average_controller

Contributor

maotto commented Mar 21, 2019

added an AveragingEpisodesController

allows to accumulate and average reward histories by function that is passed via feedback_averaging_function
- in many cases, the default should be reasonable: sum up the reward history of an individual rollout, collect them in a list and use the median of these values as a final return
allows to prepare an environment for each repetition (e.g. seeding) to make results repeatable
does not support recording of trajectories and raw reward histories

maotto added 3 commits

March 21, 2019 14:21


          a simple hack to evaluate a behavior in 10 differently seeded environ…

8418b3b

…ments and return the median of the returns; TODO: implement this nicely in a controller subclass


          AveragingEpisodesController with hardcoded merging func. and repetitions

78bd124


          add AveragingEpisodesController with documentation and defaults;

c14e40d

* does not support recording of trajectories and raw reward histories
* allows to accumulate and average reward histories by function that is
passed via feedback_averaging_function
* allows to prepare an environment for each repetition (e.g. seeding) to
make results repeatable

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                  See base class "Controller" for details on usage.
+                  Additional Parameters
+                  ----------

Contributor

AlexanderFabisch Mar 22, 2019

more -

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                  Additional Parameters
+                  ----------
+                  num_repetitions_to_average : int, optional (default: 10)

Contributor

AlexanderFabisch Mar 22, 2019

we usually try to use n_ as an abbreviation for number.

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      if the environment is stochastic or specifically prepared via the
+                      argument environment_preparation_function
+                  feedback_averaging_function : function, optional (default: median_of_sums)

Contributor

AlexanderFabisch Mar 22, 2019

It is a callback, not just a function. It also does not have to be a function, it can be any callable.

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      Note that the number of feedbacks per rollout may vary.
+                      See AveragingEpisodesController.median_of_sums (default) for an example
+                  environment_preparation_function : function, optional (default: None)

Contributor

AlexanderFabisch Mar 22, 2019

same applies here

AlexanderFabisch reviewed

View reviewed changes

bolero/controller/controller.py

+                      self.record_inputs = False
+                      self.record_outputs = False
+                      self.record_feedbacks = False
+                      self.accumulate_feedbacks = False  # see feedback_averaging_function

Contributor

AlexanderFabisch Mar 22, 2019

this comment does not really help

AlexanderFabisch changed the title ~~AveragingEpisodesController~~ [WIP] AveragingEpisodesController

Contributor

AlexanderFabisch commented May 27, 2019

@maotto any progress?

# for free to join this conversation on GitHub. Already have an account? # to comment

Labels

None yet