Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] AveragingEpisodesController #89

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

maotto
Copy link
Contributor

@maotto maotto commented Mar 21, 2019

added an AveragingEpisodesController

  • allows to accumulate and average reward histories by function that is passed via feedback_averaging_function
    • in many cases, the default should be reasonable: sum up the reward history of an individual rollout, collect them in a list and use the median of these values as a final return
  • allows to prepare an environment for each repetition (e.g. seeding) to make results repeatable
  • does not support recording of trajectories and raw reward histories

maotto added 3 commits March 21, 2019 14:21
…ments and return the median of the returns; TODO: implement this nicely in a controller subclass
* does not support recording of trajectories and raw reward histories
* allows to accumulate and average reward histories by function that is
passed via feedback_averaging_function
* allows to prepare an environment for each repetition (e.g. seeding) to
make results repeatable
See base class "Controller" for details on usage.

Additional Parameters
----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more -


Additional Parameters
----------
num_repetitions_to_average : int, optional (default: 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually try to use n_ as an abbreviation for number.

if the environment is stochastic or specifically prepared via the
argument environment_preparation_function

feedback_averaging_function : function, optional (default: median_of_sums)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a callback, not just a function. It also does not have to be a function, it can be any callable.

Note that the number of feedbacks per rollout may vary.
See AveragingEpisodesController.median_of_sums (default) for an example

environment_preparation_function : function, optional (default: None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same applies here

self.record_inputs = False
self.record_outputs = False
self.record_feedbacks = False
self.accumulate_feedbacks = False # see feedback_averaging_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment does not really help

@AlexanderFabisch AlexanderFabisch changed the title AveragingEpisodesController [WIP] AveragingEpisodesController Mar 26, 2019
@AlexanderFabisch
Copy link
Contributor

@maotto any progress?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants