Goal for pendulum? #3

drozzy · 2019-10-20T19:38:28Z

No description provided.

drozzy · 2019-10-20T19:39:32Z

I’m just curious how did you define a goal for the pendulum task because it is continuous task in nature?

nikhilbarhate99 · 2019-10-21T13:07:24Z

Hierarchical Actor Critic is ONLY for continuous tasks (Hier-Q, as described in the paper, is for discrete tasks, but is NOT implemented in this repo).

Coming back to your question of defining goal,
The state space of pendulum task in the official gym implementation includes the variables [angular velocity, sine theta, cosine theta]. This state is difficult for the hierarchical policies to predict, since they also need to learn the relation between sine and cosine to predict the right goal state. So, I have modified the state space to include only [angular velocity, normalized theta]. This gives reasonable performance, although its not consistent. (the modified file is available in the gym folder)

I tried with the following hyerparameters:

#################### Hyperparameters ####################
env_name = "Pendulum-v0"
save_episode = 10               # keep saving every n episodes
max_episodes = 1000             # max num of training episodes
random_seed = 0
render = True

env = gym.make(env_name)
state_dim = 2
action_dim = env.action_space.shape[0]


# primitive action bounds and offset
action_bounds = env.action_space.high[0]
action_offset = np.array([0.0])
action_offset = torch.FloatTensor(action_offset.reshape(1, -1)).to(device)
action_clip_low = np.array([-1.0 * action_bounds])
action_clip_high = np.array([action_bounds])

# state bounds and offset
state_bounds_np = np.array([np.pi, 8.0])
state_bounds = torch.FloatTensor(state_bounds_np.reshape(1, -1)).to(device)
state_offset =  np.array([0.0, 0.0])
state_offset = torch.FloatTensor(state_offset.reshape(1, -1)).to(device)
state_clip_low = np.array([-np.pi, -8.0])
state_clip_high = np.array([np.pi, 8.0])

# exploration noise std for primitive action and subgoals
exploration_action_noise = np.array([0.1])        
exploration_state_noise = np.array([np.deg2rad(10), 0.4]) 

goal_state = np.array([0, 0])        # final goal state to be achived
threshold = np.array([np.deg2rad(10), 0.05])         # threshold value to check if goal state is achieved

# HAC parameters:
k_level = 2                 # num of levels in hierarchy
H = 20                      # time horizon to achieve subgoal
lamda = 0.3                 # subgoal testing parameter

# DDPG parameters:
gamma = 0.95                # discount factor for future rewards
n_iter = 100                # update policy n_iter times in one DDPG update
batch_size = 100            # num of transitions sampled from replay buffer
lr = 0.001

# save trained models
directory = "./".format(env_name, k_level) 
filename = "HAC_{}".format(env_name)
#########################################################

drozzy · 2019-10-21T15:23:09Z

Thanks, exactly what I was looking for.

drozzy changed the title ~~Goal for pendum~~ Goal for pendulum? Oct 20, 2019

drozzy closed this as completed Oct 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goal for pendulum? #3

Goal for pendulum? #3

drozzy commented Oct 20, 2019

drozzy commented Oct 20, 2019

nikhilbarhate99 commented Oct 21, 2019 •

edited

Loading

drozzy commented Oct 21, 2019

Goal for pendulum? #3

Goal for pendulum? #3

Comments

drozzy commented Oct 20, 2019

drozzy commented Oct 20, 2019

nikhilbarhate99 commented Oct 21, 2019 • edited Loading

drozzy commented Oct 21, 2019

nikhilbarhate99 commented Oct 21, 2019 •

edited

Loading