Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

action and state offsets? #2

Closed
drozzy opened this issue Sep 29, 2019 · 4 comments
Closed

action and state offsets? #2

drozzy opened this issue Sep 29, 2019 · 4 comments

Comments

@drozzy
Copy link

drozzy commented Sep 29, 2019

I am just curious what do the action/state offset values mean?

https://github.com/nikhilbarhate99/Hierarchical-Actor-Critic-HAC-PyTorch/blob/master/train.py#L36

I can't seem to figure it out. How do you determine them, for example, for a new environment?

Similarly, for the clip low/high values for both action and states? If you could explain those as well I would appreciate it.

Thank you.

@nikhilbarhate99
Copy link
Owner

The action and state space for a lot of environments are NOT normalised between (-1, 1), but we still need to some how bound the output values of the neural network, so a Tanh activation function at the end of the network does not sufficiently bound the output because the spaces are not normalised.

So the actions given to the environment are modified accordingly:
action = ( network output (Tanh) * bounds ) + offset

For example, in mountain car continuous env:

the action space is between (-1, 1), and as the mean value [ (1 + (-1)) / 2 ] is 0 we do not require an offset, and the value of bound = 1, since our network only outputs between (-1, 1), so,

action = ( network output (Tanh) * bounds ) + offset
i.e action = (network output * 1) + 0

in HAC, the higher level policy also need to output a goal state, so we bound that in a similar way.
(here the output goal state is considered as action for the high level policy)

But the state space of mountain car continuous env is defined as [position, velocity] between min value = [-1.2, -0.07] and max value = [0.6, 0.07],

here the position variable (-1.2, 0.6) is NOT normalised to (-1,1) and its mean value [ (0.6 + (-1.2)) / 2 ] is 0.3

action = ( network output (Tanh) * bounds ) + offset

for position variable:
action = (network output * 0.9) + 0.3
this bounds the value of the action to (-1.2, 0.6)

similarly, the velocity variable (-0.07, 0.07) is NOT normalised to (-1,1) and its mean value [ (0.6 + (-1.2)) / 2 ] is
0, so,

for velocity variable:
action = (network output * 0.07) + 0
this bounds the value of the action to (-0.07, 0.07)

So, the net action is bound between min value = [-1.2, -0.07] and max value = [0.6, 0.07]

The clip high/low are simply the max and min values of the action space. We use this to clip the output after adding noise to ensure that the action values after adding noise does not exceed the environment bounds. These can be obtained easily by going through the documentation of the environment.

@drozzy drozzy closed this as completed Sep 30, 2019
@drozzy
Copy link
Author

drozzy commented Sep 30, 2019

Thanks, that helps a lot!

What about the exploration_action_noise and exploration_state_noise values?
Are they derived from action/state spaces somehow?

@drozzy drozzy reopened this Sep 30, 2019
@nikhilbarhate99
Copy link
Owner

No, the exploration_action_noise and exploration_state_noise are hyper parameters that need to be tuned by experimentation.

@drozzy
Copy link
Author

drozzy commented Oct 2, 2019

Thanks.

@drozzy drozzy closed this as completed Oct 2, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants