Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Descriptions of action spaces & observation spaces #585

Closed
econti opened this issue May 9, 2017 · 29 comments
Closed

Descriptions of action spaces & observation spaces #585

econti opened this issue May 9, 2017 · 29 comments
Labels

Comments

@econti
Copy link

econti commented May 9, 2017

Do descriptions of different environment's action spaces & observation spaces exist anywhere? For example, with Humanoid-V1 the action space is a 17-D vector that presumably maps to different body parts, but are these numbers torques, angles, etc.? Same goes with the observation space - a brief description of what the 376 dimensions correspond to would be incredibly useful to know.

@olegklimov
Copy link
Contributor

You don't want to know this, unless you're engineering (as opposed to learning) a solution.

If you still need to know which is which, just try those actions one by one, watch the robot.

@stevenschmatz
Copy link
Contributor

A description could still be useful to understand what the RL algorithm learned.

@econti
Copy link
Author

econti commented May 12, 2017

I see your point @olegklimov, but to @stevenschmatz's point, just for the sake of understanding what your network has learned, it helps to know some detail about the action space and observation space. I do agree with you though that you technically don't need to know these details if you're learning a solution.

@filmo
Copy link

filmo commented Jul 2, 2017

I'm with @econti. I think a description would be good. There may be a case where we want to neutralize certain actions or parts of the observed state space and it's far easier to zero them out if we could consult a description table than to iterate through them all to manually figure them out.

@olegklimov is of course right in that it's not needed for RL learning, but I disagree that there isn't a valid use-case.

@erwincoumans
Copy link

erwincoumans commented Mar 2, 2018

I needed similar info, and share what I found out so far:

For the Ant, the observation is:

 def _get_obs(self):
        return np.concatenate([
            self.sim.data.qpos.flat[2:],
            self.sim.data.qvel.flat,
            np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
        ])

self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles.

The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso.

self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities.

The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg).

For the Humanoid, the observation adds some more fields:

def _get_obs(self):
        data = self.sim.data
        return np.concatenate([data.qpos.flat[2:],
                               data.qvel.flat,
                               data.cinert.flat,
                               data.cvel.flat,
                               data.qfrc_actuator.flat,
                               data.cfrc_ext.flat])

The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity.

You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:

b'abdomen_z'
b'abdomen_y'
b'abdomen_x'
b'right_hip_x'
b'right_hip_z'
b'right_hip_y'
b'right_knee'
b'right_ankle_y'
b'right_ankle_x'
b'left_hip_x'
b'left_hip_z'
b'left_hip_y'
b'left_knee'
b'left_ankle_y'
b'left_ankle_x'
b'right_shoulder1'
b'right_shoulder2'
b'right_elbow'
b'left_shoulder1'
b'left_shoulder2'
b'left_elbow'

@benelot
Copy link

benelot commented May 23, 2018

Maybe this helps to some of you:
-https://github.com/openai/mujoco-py/blob/master/mujoco_py/pxd/mjdata.pxd

There are smaller description at the side of each field and it tells you what it is computed by. Unfortunately, it might only help people working intensively with physics engines (like @erwincoumans).

@xinghua-qu
Copy link

Is there any document that describes such physical meaning of other environments (e.g., Walker2D, Hopper and HalfCheetah)? The humanoid in pybullet has 44 dimensions in state space, but only 21 of them are explained in physical meaning. Does that mean some dimensions are unknown with respect to the physical meaning? Additionally, the xml file for urdf is a little bit hard to read.

@ajithvcoder
Copy link

Is there explanation for Walker2D like this https://github.com/openai/gym/wiki/Humanoid-V1 . In xml file only 9 are there other parameters are not there .Totaly there are 22 . @QuXinghuaNTU were u able to find it

@xinghua-qu
Copy link

Is there explanation for Walker2D like this https://github.com/openai/gym/wiki/Humanoid-V1 . In xml file only 9 are there other parameters are not there .Totaly there are 22 . @QuXinghuaNTU were u able to find it

Hi,
Indeed, there are some parameters that are unknown of their physical meaning.
I tried to find such documents but failed.
One possible way (from my side) to know the exact meaning of each dimension is by playing the agent with changing the value of each dimension. That, however, is quite time-consuming. Due to I have moved to Atari for a period, it's hard for me to help you on that currently.
If someday I come back to Mujoco, I will post the results here then.
Stay safe and good luck with your research

@ryanmaxwell96
Copy link

ryanmaxwell96 commented Jul 7, 2020

I needed similar info, and share what I found out so far:

For the Ant, the observation is:

 def _get_obs(self):
        return np.concatenate([
            self.sim.data.qpos.flat[2:],
            self.sim.data.qvel.flat,
            np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
        ])

self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles.

The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso.

self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities.

The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg).

For the Humanoid, the observation adds some more fields:

def _get_obs(self):
        data = self.sim.data
        return np.concatenate([data.qpos.flat[2:],
                               data.qvel.flat,
                               data.cinert.flat,
                               data.cvel.flat,
                               data.qfrc_actuator.flat,
                               data.cfrc_ext.flat])

The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity.

You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:

b'abdomen_z'
b'abdomen_y'
b'abdomen_x'
b'right_hip_x'
b'right_hip_z'
b'right_hip_y'
b'right_knee'
b'right_ankle_y'
b'right_ankle_x'
b'left_hip_x'
b'left_hip_z'
b'left_hip_y'
b'left_knee'
b'left_ankle_y'
b'left_ankle_x'
b'right_shoulder1'
b'right_shoulder2'
b'right_elbow'
b'left_shoulder1'
b'left_shoulder2'
b'left_elbow'

Can you clarify this? There are 29 observations I believe, but I tried adding up the number you were referring to (7+8 for qpos -2 for the [2:], then 6+8 for qvel = 27 -> no contact forces included yet) and the number does not seem to add up correctly. For instance, what do you mean by 14*6?

@ryanmaxwell96
Copy link

ryanmaxwell96 commented Jul 7, 2020

Specifically, I'd really like to know what the state[2] is referring to since that is the one that is triggering the "notdone" signal for Antv2. Is it the z level of the torso, but this is confusing because new episodes are not triggered when the Ant flips over?

@ryanmaxwell96
Copy link

I think that is because of parameters .2 and 1 the line:
notdone = np.isfinite(state).all()
and state[2] >= 0.2 and state[2] <= 1.0
In ant.py. If I change 0.2 to 0.4 and 1.0 to 2.0, it starts to clearly die if it flips over

@ryanmaxwell96
Copy link

I needed similar info, and share what I found out so far:
For the Ant, the observation is:

 def _get_obs(self):
        return np.concatenate([
            self.sim.data.qpos.flat[2:],
            self.sim.data.qvel.flat,
            np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
        ])

self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles.
The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso.
self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities.
The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg).
For the Humanoid, the observation adds some more fields:

def _get_obs(self):
        data = self.sim.data
        return np.concatenate([data.qpos.flat[2:],
                               data.qvel.flat,
                               data.cinert.flat,
                               data.cvel.flat,
                               data.qfrc_actuator.flat,
                               data.cfrc_ext.flat])

The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity.
You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:

b'abdomen_z'
b'abdomen_y'
b'abdomen_x'
b'right_hip_x'
b'right_hip_z'
b'right_hip_y'
b'right_knee'
b'right_ankle_y'
b'right_ankle_x'
b'left_hip_x'
b'left_hip_z'
b'left_hip_y'
b'left_knee'
b'left_ankle_y'
b'left_ankle_x'
b'right_shoulder1'
b'right_shoulder2'
b'right_elbow'
b'left_shoulder1'
b'left_shoulder2'
b'left_elbow'

Can you clarify this? There are 29 observations I believe, but I tried adding up the number you were referring to (7+8 for qpos -2 for the [2:], then 6+8 for qvel = 27 -> no contact forces included yet) and the number does not seem to add up correctly. For instance, what do you mean by 14*6?

My bad. The obs just include qpos and qvel which normally adds up to 29 but the [2:] brings it down to 27. So apparently the contact forces are not actually apart of the observations I think.

@ryanmaxwell96
Copy link

Can someone tell me where each observation is in the 27 length vector? Specifically, which are the observations for each of the joints? (i.e. Which ones are the left back leg joints, left front leg joints, etc.)

@ryanmaxwell96
Copy link

Can someone tell me where each observation is in the 27 length vector? Specifically, which are the observations for each of the joints? (i.e. Which ones are the left back leg joints, left front leg joints, etc.)

Ok, so I was not able to figure it out through the code, but I locked each leg in succession and identified which part of the observation was moving. Here are my results:

(Front looking to the right)
Pos (Torso)
0 x

1 y

2 z

Orient (Torso)
3 x

4 y

5 z

6 w

joint angles
7 1 Front left leg hip angle <---

8 2 Front left leg ankle angle <---

9 3 Back left leg hip angle <---

10 4 Back left leg ankle angle <---

11 5 Back left leg hip angle <---

12 6 Back right leg ankle angle <---

13 7 Front right leg hip angle <---

14 8 Front right leg ankle angle <---

Vel (Torso)

15 x

16 y

17 z

Angular Vel (Torso)
18 x

19 y

20 z

joint vel
21 1 Front left leg hip angle <---

22 2 Front left leg ankle angle <---

23 3 Back left leg hip angle <---

24 4 Back left leg ankle angle <---

25 5 Back right leg hip angle <---

26 6 Back right leg ankle angle <---

27 7 Front right leg hip angle <---

28 8 Front right leg ankle angle <---

@ghost
Copy link

ghost commented Aug 3, 2020

Can someone tell me where each observation is in the 27 length vector? Specifically, which are the observations for each of the joints? (i.e. Which ones are the left back leg joints, left front leg joints, etc.)

Ok, so I was not able to figure it out through the code, but I locked each leg in succession and identified which part of the observation was moving. Here are my results:

(Front looking to the right)
Pos (Torso)
0 x

1 y

2 z

Orient (Torso)
3 x

4 y

5 z

6 w

joint angles
7 1 Front left leg hip angle <---

8 2 Front left leg ankle angle <---

9 3 Back left leg hip angle <---

10 4 Back left leg ankle angle <---

11 5 Back left leg hip angle <---

12 6 Back right leg ankle angle <---

13 7 Front right leg hip angle <---

14 8 Front right leg ankle angle <---

Vel (Torso)

15 x

16 y

17 z

Angular Vel (Torso)
18 x

19 y

20 z

joint vel
21 1 Front left leg hip angle <---

22 2 Front left leg ankle angle <---

23 3 Back left leg hip angle <---

24 4 Back left leg ankle angle <---

25 5 Back right leg hip angle <---

26 6 Back right leg ankle angle <---

27 7 Front right leg hip angle <---

28 8 Front right leg ankle angle <---

Why are there 29 variables...

@xinghua-qu
Copy link

Your index starts from 0. That's why there are totally 29 variables.

@ghost
Copy link

ghost commented Aug 4, 2020

https://atomscott.github.io/football/ruby/2019/06/28/Understanding-the-Environment.html This might be useful.

@ryanmaxwell96
Copy link

I disagree that the input action is torque between +1 and -1. For one, I just did a test and I had one value over 1 (1.7). Plus, I can lock legs by setting them to 0.2. If this was torque, a value of 0.2 should not be locking anything. So it seems like it is position instead. Can someone confirm?

@ryanmaxwell96
Copy link

Nevermind. I agree that it is torque now. It was getting stuck at the boundaries specified in ant.xml

@fhln
Copy link

fhln commented Sep 14, 2020

I disagree that the input action is torque between +1 and -1. For one, I just did a test and I had one value over 1 (1.7). Plus, I can lock legs by setting them to 0.2. If this was torque, a value of 0.2 should not be locking anything. So it seems like it is position instead. Can someone confirm?

The ctrl range is limited between -1 and +1 whether the input is. And strictly speaking, the input action is not torque but torque*150, because of gear definition in ant.xml. More information of these actuators' definition can be found in http://mujoco.org/book/XMLreference.html#actuator

@ferreirafabio
Copy link

You don't want to know this, unless you're engineering (as opposed to learning) a solution.

If you still need to know which is which, just try those actions one by one, watch the robot.

Are you completely serious or are you trolling? If you’re serious, let me be serious and straightforward: your response is complete BS. Of course the thread opener wants to know this if he/she is asking for it. Notice that we lack even an overview of some meta features of the Mujoco environments such as „is the action space discrete or continuous?“, „how many dimensions does it have?“, „is it one-hot-encoded?“, or „which joint am I controlling with dimension x?“ and so on.

@bozic-djordje
Copy link

... Is it the z level of the torso, but this is confusing because new episodes are not triggered when the Ant flips over?

Hi! I am having trouble understanding what you meant by new episodes not being triggered when the ant flips over. I unfortunately do not have access to MuJoCo so I cannot test this myself, but the answer to the following question is critical for me: Does the episode terminate immediately as the ant flips over, or is there a certain period when the ant is flipped over and we are waiting for it to recover? Judging by your very next response it seems to be the latter. Could you please confirm this?

@ryanmaxwell96
Copy link

Hey! I believe the when the ant torso gets closer than some threshold to the ground, the episode immediately restarts. I think you can see this in the ant.py file where z=state[2]

@bozic-djordje
Copy link

Got it, thanks!

@bara-bba
Copy link

Hi everyone! I'm kinda stuck in the same problem: I would like to control a robotic EE in position only, so I wrote in the actuator part of XML code and all the magic stuff needed. The problem I encounter is that during RL the action is sampled in the ctrlrange butI would like to have the whole joint space while keeping a limited action sample. Is that any way to solve this stuff? Thanks!

@PBarde
Copy link

PBarde commented Mar 14, 2022

I needed similar info, and share what I found out so far:

For the Ant, the observation is:

 def _get_obs(self):
        return np.concatenate([
            self.sim.data.qpos.flat[2:],
            self.sim.data.qvel.flat,
            np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
        ])

self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles.

The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso.

self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities.

The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg).

For the Humanoid, the observation adds some more fields:

def _get_obs(self):
        data = self.sim.data
        return np.concatenate([data.qpos.flat[2:],
                               data.qvel.flat,
                               data.cinert.flat,
                               data.cvel.flat,
                               data.qfrc_actuator.flat,
                               data.cfrc_ext.flat])

The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity.

You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:

b'abdomen_z'
b'abdomen_y'
b'abdomen_x'
b'right_hip_x'
b'right_hip_z'
b'right_hip_y'
b'right_knee'
b'right_ankle_y'
b'right_ankle_x'
b'left_hip_x'
b'left_hip_z'
b'left_hip_y'
b'left_knee'
b'left_ankle_y'
b'left_ankle_x'
b'right_shoulder1'
b'right_shoulder2'
b'right_elbow'
b'left_shoulder1'
b'left_shoulder2'
b'left_elbow'

Hi @erwincoumans, does your script work for all the gym-mujoco envs? If yes I'd be really interested in it, is it publicly available?
Thanks!

@jkterry1
Copy link
Collaborator

Hey, we detailed docs for this now here (https://www.gymlibrary.ml/), so I'm going to close this issue

@BenGravell
Copy link

In case anyone from the future stumbles on this thread.

As of December 29, 2023 the following are true:

https://www.gymlibrary.ml/ is broken, presumably deprecated in favor of https://gymnasium.farama.org.

The new gymnasium docs include tables for the complete action and observation spaces. e.g. Ant https://gymnasium.farama.org/environments/mujoco/ant/ looks something like this:

Action Space

Num Action Control Min Control Max Name (in corresponding XML file) Joint Unit
0 Torque applied on the rotor between the torso and back right hip -1 1 hip_4 (right_back_leg) hinge torque (N m)
1 Torque applied on the rotor between the back right two links -1 1 angle_4 (right_back_leg) hinge torque (N m)
2 Torque applied on the rotor between the torso and front left hip -1 1 hip_1 (front_left_leg) hinge torque (N m)
3 Torque applied on the rotor between the front left two links -1 1 angle_1 (front_left_leg) hinge torque (N m)
4 Torque applied on the rotor between the torso and front right hip -1 1 hip_2 (front_right_leg) hinge torque (N m)
5 Torque applied on the rotor between the front right two links -1 1 angle_2 (front_right_leg) hinge torque (N m)
6 Torque applied on the rotor between the torso and back left hip -1 1 hip_3 (back_leg) hinge torque (N m)
7 Torque applied on the rotor between the back left two links -1 1 angle_3 (back_leg) hinge torque (N m)

Observation Space

Num Observation Min Max Name (in corresponding XML file) Joint Unit
0 z-coordinate of the torso (centre) -Inf Inf torso free position (m)
1 x-orientation of the torso (centre) -Inf Inf torso free angle (rad)
2 y-orientation of the torso (centre) -Inf Inf torso free angle (rad)
3 z-orientation of the torso (centre) -Inf Inf torso free angle (rad)
4 w-orientation of the torso (centre) -Inf Inf torso free angle (rad)
5 angle between torso and first link on front left -Inf Inf hip_1 (front_left_leg) hinge angle (rad)
6 angle between the two links on the front left -Inf Inf ankle_1 (front_left_leg) hinge angle (rad)
7 angle between torso and first link on front right -Inf Inf hip_2 (front_right_leg) hinge angle (rad)
8 angle between the two links on the front right -Inf Inf ankle_2 (front_right_leg) hinge angle (rad)
9 angle between torso and first link on back left -Inf Inf hip_3 (back_leg) hinge angle (rad)
10 angle between the two links on the back left -Inf Inf ankle_3 (back_leg) hinge angle (rad)
11 angle between torso and first link on back right -Inf Inf hip_4 (right_back_leg) hinge angle (rad)
12 angle between the two links on the back right -Inf Inf ankle_4 (right_back_leg) hinge angle (rad)
13 x-coordinate velocity of the torso -Inf Inf torso free velocity (m/s)
14 y-coordinate velocity of the torso -Inf Inf torso free velocity (m/s)
15 z-coordinate velocity of the torso -Inf Inf torso free velocity (m/s)
16 x-coordinate angular velocity of the torso -Inf Inf torso free angular velocity (rad/s)
17 y-coordinate angular velocity of the torso -Inf Inf torso free angular velocity (rad/s)
18 z-coordinate angular velocity of the torso -Inf Inf torso free angular velocity (rad/s)
19 angular velocity of angle between torso and front left link -Inf Inf hip_1 (front_left_leg) hinge angle (rad)
20 angular velocity of the angle between front left links -Inf Inf ankle_1 (front_left_leg) hinge angle (rad)
21 angular velocity of angle between torso and front right link -Inf Inf hip_2 (front_right_leg) hinge angle (rad)
22 angular velocity of the angle between front right links -Inf Inf ankle_2 (front_right_leg) hinge angle (rad)
23 angular velocity of angle between torso and back left link -Inf Inf hip_3 (back_leg) hinge angle (rad)
24 angular velocity of the angle between back left links -Inf Inf ankle_3 (back_leg) hinge angle (rad)
25 angular velocity of angle between torso and back right link -Inf Inf hip_4 (right_back_leg) hinge angle (rad)
26 angular velocity of the angle between back right links -Inf Inf ankle_4 (right_back_leg) hinge angle (rad)
excluded x-coordinate of the torso (centre) -Inf Inf torso free position (m)
excluded y-coordinate of the torso (centre) -Inf Inf torso free position (m)

I say "something like" because a straight copy-paste of the tables from https://gymnasium.farama.org/environments/mujoco/ant/ into the GitHub comment editor did not work, so I asked ChatGPT to convert from HTML to a Markdown compatible table, and that's what I pasted here. There could be translation errors, but I'm not inclined to waste time poring over it, it's close enough to get the main idea that the details that were missing years ago have been filled in.

@jkterry1 Thanks to the maintainers for keeping gym alive & documented!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests