Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Goal for pendulum? #3

Closed
drozzy opened this issue Oct 20, 2019 · 3 comments
Closed

Goal for pendulum? #3

drozzy opened this issue Oct 20, 2019 · 3 comments

Comments

@drozzy
Copy link

drozzy commented Oct 20, 2019

No description provided.

@drozzy
Copy link
Author

drozzy commented Oct 20, 2019

I’m just curious how did you define a goal for the pendulum task because it is continuous task in nature?

@drozzy drozzy changed the title Goal for pendum Goal for pendulum? Oct 20, 2019
@nikhilbarhate99
Copy link
Owner

nikhilbarhate99 commented Oct 21, 2019

Hierarchical Actor Critic is ONLY for continuous tasks (Hier-Q, as described in the paper, is for discrete tasks, but is NOT implemented in this repo).

Coming back to your question of defining goal,
The state space of pendulum task in the official gym implementation includes the variables [angular velocity, sine theta, cosine theta]. This state is difficult for the hierarchical policies to predict, since they also need to learn the relation between sine and cosine to predict the right goal state. So, I have modified the state space to include only [angular velocity, normalized theta]. This gives reasonable performance, although its not consistent. (the modified file is available in the gym folder)

I tried with the following hyerparameters:

#################### Hyperparameters ####################
env_name = "Pendulum-v0"
save_episode = 10               # keep saving every n episodes
max_episodes = 1000             # max num of training episodes
random_seed = 0
render = True

env = gym.make(env_name)
state_dim = 2
action_dim = env.action_space.shape[0]


# primitive action bounds and offset
action_bounds = env.action_space.high[0]
action_offset = np.array([0.0])
action_offset = torch.FloatTensor(action_offset.reshape(1, -1)).to(device)
action_clip_low = np.array([-1.0 * action_bounds])
action_clip_high = np.array([action_bounds])

# state bounds and offset
state_bounds_np = np.array([np.pi, 8.0])
state_bounds = torch.FloatTensor(state_bounds_np.reshape(1, -1)).to(device)
state_offset =  np.array([0.0, 0.0])
state_offset = torch.FloatTensor(state_offset.reshape(1, -1)).to(device)
state_clip_low = np.array([-np.pi, -8.0])
state_clip_high = np.array([np.pi, 8.0])

# exploration noise std for primitive action and subgoals
exploration_action_noise = np.array([0.1])        
exploration_state_noise = np.array([np.deg2rad(10), 0.4]) 

goal_state = np.array([0, 0])        # final goal state to be achived
threshold = np.array([np.deg2rad(10), 0.05])         # threshold value to check if goal state is achieved

# HAC parameters:
k_level = 2                 # num of levels in hierarchy
H = 20                      # time horizon to achieve subgoal
lamda = 0.3                 # subgoal testing parameter

# DDPG parameters:
gamma = 0.95                # discount factor for future rewards
n_iter = 100                # update policy n_iter times in one DDPG update
batch_size = 100            # num of transitions sampled from replay buffer
lr = 0.001

# save trained models
directory = "./".format(env_name, k_level) 
filename = "HAC_{}".format(env_name)
#########################################################

@drozzy
Copy link
Author

drozzy commented Oct 21, 2019

Thanks, exactly what I was looking for.

@drozzy drozzy closed this as completed Oct 21, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants