-
Notifications
You must be signed in to change notification settings - Fork 124
Controllers
All controllers should inherit from the base Controller class (which does nothing when receiving the various signals emitted by an agent). The following methods are defined in this base controller class:
-
__init__(self)
: Activate the controller. All controllers inheriting this class should call this method in their own__init()__
usingsuper(self.__class__, self).__init__()
. -
setActive(self, active)
: Activate or deactivate this controller. A controller should not react to any signal it receives as long as it is deactivated. For instance, if a controller maintains a counter on how many episodes it has seen, this counter should not be updated when this controller is disabled. -
OnStart(self, agent)
: Called when the agent is going to start working (before anything else). This corresponds to the moment where the agent'srun()
method is called. -
OnEpisodeEnd(self, agent, terminalReached, reward)
: Called whenever the agent ends an episode, just after this episode ended and before anyOnEpochEnd()
signal could be sent. -
OnEpochEnd(self, agent)
: Called whenever the agent ends an epoch, just after the last episode of this epoch was ended and after anyOnEpisodeEnd()
signal was processed. -
OnActionChosen(self, agent, action)
: Called whenever the agent has chosen an action. This occurs after the agent state was updated with the new observation it made, but before it applied this action on the environment and before the total reward is updated. -
OnActionTaken(self, agent)
: Called whenever the agent has taken an action on its environment. This occurs after the agent applied this action on the environment and before terminality is evaluated. This is called only once, even in the case where the agent skip frames by taking the same action multiple times. In other words, this occurs just before the next observation of the environment. -
OnEnd(self, agent)
: Called when the agent has finished processing all its epochs, just before returning from itsrun()
method.
The order in which controllers are attached matters. Indeed, if controllers C1, C2 and C3 were attached in this order and C1 and C3 both listen to the OnEpisodeEnd
signal, the OnEpisodeEnd()
method of C1 will be called before the OnEpisodeEnd()
method of C3, whenever an episode ends.
Examples of controllers can be found in the experiment.base_controllers
module. This module gathers all controllers used for experiments carried on the demo environments.
Examples of how controllers can be attached to an agent in order to carry an experiment can be found in the launchers of the demo environments, run_MG_two_storages.py
and run_toy_env.py
. Let's review run_MG_two_storages.py
in details to understand how it works.
VALIDATION_MODE = 0
TEST_MODE = 1
fname = hash(vars(parameters), hash_name="sha1")
print("The parameters hash is: {}".format(fname))
print("The parameters are: {}".format(parameters))
agent.attach(bc.VerboseController())
agent.attach(bc.TrainerController(periodicity=parameters.update_frequency))
agent.attach(bc.LearningRateController(parameters.learning_rate, parameters.learning_rate_decay))
agent.attach(bc.DiscountFactorController(parameters.discount, parameters.discount_inc, parameters.discount_max))
agent.attach(bc.EpsilonController(parameters.epsilon_start, parameters.epsilon_decay, parameters.epsilon_min))
agent.attach(bc.FindBestController(VALIDATION_MODE, unique_fname=fname, testID=TEST_MODE))
agent.attach(bc.InterleavedTestEpochController(VALIDATION_MODE, parameters.steps_per_test, [0, 1, 2, 3, 4, 7], periodicity=2, summarizeEvery=-1))
agent.attach(bc.InterleavedTestEpochController(TEST_MODE, parameters.steps_per_test, [0, 1, 2, 3, 4, 6], periodicity=2, summarizeEvery=parameters.period_btw_summary_perfs))
We want to train a deep neural network whenever an action is taken, and evaluate its validation score every epoch using a validation set. The goal of the experiment is to find the neural network maximizing the validation score over all training epochs. We also want to evaluate how well the neural network generalizes to new data (i.e. to a test set) so that we can plot a graph at the end showing the evolution of the validation and test scores after every training epoch (and see if we do not overfit the validation data).