Skip to content

Controllers

David Taralla edited this page Mar 17, 2016 · 5 revisions

Controllers

All controllers should inherit from the base Controller class (which does nothing when receiving the various signals emitted by an agent). The following methods are defined in this base controller class:

  • __init__(self): Activate the controller. All controllers inheriting this class should call this method in their own __init()__ using super(self.__class__, self).__init__().

  • setActive(self, active): Activate or deactivate this controller. A controller should not react to any signal it receives as long as it is deactivated. For instance, if a controller maintains a counter on how many episodes it has seen, this counter should not be updated when this controller is disabled.

  • OnStart(self, agent): Called when the agent is going to start working (before anything else). This corresponds to the moment where the agent's run() method is called.

  • OnEpisodeEnd(self, agent, terminalReached, reward): Called whenever the agent ends an episode, just after this episode ended and before any OnEpochEnd() signal could be sent.

  • OnEpochEnd(self, agent): Called whenever the agent ends an epoch, just after the last episode of this epoch was ended and after any OnEpisodeEnd() signal was processed.

  • OnActionChosen(self, agent, action): Called whenever the agent has chosen an action. This occurs after the agent state was updated with the new observation it made, but before it applied this action on the environment and before the total reward is updated.

  • OnActionTaken(self, agent): Called whenever the agent has taken an action on its environment. This occurs after the agent applied this action on the environment and before terminality is evaluated. This is called only once, even in the case where the agent skip frames by taking the same action multiple times. In other words, this occurs just before the next observation of the environment.

  • OnEnd(self, agent): Called when the agent has finished processing all its epochs, just before returning from its run() method.

The order in which controllers are attached matters. Indeed, if controllers C1, C2 and C3 were attached in this order and C1 and C3 both listen to the OnEpisodeEnd signal, the OnEpisodeEnd() method of C1 will be called before the OnEpisodeEnd() method of C3, whenever an episode ends.

Examples of controllers can be found in the experiment.base_controllers module. This module gathers all controllers used for experiments carried on the demo environments.

Attaching controllers to carry an experiment

Examples of how controllers can be attached to an agent in order to carry an experiment can be found in the launchers of the demo environments, run_MG_two_storages.py and run_toy_env.py. Let's review run_MG_two_storages.py in details to understand how it works.

    VALIDATION_MODE = 0
    TEST_MODE = 1
    fname = hash(vars(parameters), hash_name="sha1")
    print("The parameters hash is: {}".format(fname))
    print("The parameters are: {}".format(parameters))
    agent.attach(bc.VerboseController())
    agent.attach(bc.TrainerController(periodicity=parameters.update_frequency))
    agent.attach(bc.LearningRateController(parameters.learning_rate, parameters.learning_rate_decay))
    agent.attach(bc.DiscountFactorController(parameters.discount, parameters.discount_inc, parameters.discount_max))
    agent.attach(bc.EpsilonController(parameters.epsilon_start, parameters.epsilon_decay, parameters.epsilon_min))
    agent.attach(bc.FindBestController(VALIDATION_MODE, unique_fname=fname, testID=TEST_MODE))
    agent.attach(bc.InterleavedTestEpochController(VALIDATION_MODE, parameters.steps_per_test, [0, 1, 2, 3, 4, 7], periodicity=2, summarizeEvery=-1))
    agent.attach(bc.InterleavedTestEpochController(TEST_MODE, parameters.steps_per_test, [0, 1, 2, 3, 4, 6], periodicity=2, summarizeEvery=parameters.period_btw_summary_perfs))

We want to train a deep neural network whenever an action is taken, and evaluate its validation score every epoch using a validation set. The goal of the experiment is to find the neural network maximizing the validation score over all training epochs. We also want to evaluate how well the neural network generalizes to new data (i.e. to a test set) so that we can plot a graph at the end showing the evolution of the validation and test scores after every training epoch (and see if we do not overfit the validation data).