-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Descriptions of action spaces & observation spaces #585
Comments
You don't want to know this, unless you're engineering (as opposed to learning) a solution. If you still need to know which is which, just try those actions one by one, watch the robot. |
A description could still be useful to understand what the RL algorithm learned. |
I see your point @olegklimov, but to @stevenschmatz's point, just for the sake of understanding what your network has learned, it helps to know some detail about the action space and observation space. I do agree with you though that you technically don't need to know these details if you're learning a solution. |
I'm with @econti. I think a description would be good. There may be a case where we want to neutralize certain actions or parts of the observed state space and it's far easier to zero them out if we could consult a description table than to iterate through them all to manually figure them out. @olegklimov is of course right in that it's not needed for RL learning, but I disagree that there isn't a valid use-case. |
I needed similar info, and share what I found out so far: For the Ant, the observation is:
self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles. The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso. self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities. The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg). For the Humanoid, the observation adds some more fields:
The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity. You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:
|
Maybe this helps to some of you: There are smaller description at the side of each field and it tells you what it is computed by. Unfortunately, it might only help people working intensively with physics engines (like @erwincoumans). |
Is there any document that describes such physical meaning of other environments (e.g., Walker2D, Hopper and HalfCheetah)? The humanoid in pybullet has 44 dimensions in state space, but only 21 of them are explained in physical meaning. Does that mean some dimensions are unknown with respect to the physical meaning? Additionally, the xml file for urdf is a little bit hard to read. |
Is there explanation for Walker2D like this https://github.com/openai/gym/wiki/Humanoid-V1 . In xml file only 9 are there other parameters are not there .Totaly there are 22 . @QuXinghuaNTU were u able to find it |
Hi, |
Can you clarify this? There are 29 observations I believe, but I tried adding up the number you were referring to (7+8 for qpos -2 for the [2:], then 6+8 for qvel = 27 -> no contact forces included yet) and the number does not seem to add up correctly. For instance, what do you mean by 14*6? |
Specifically, I'd really like to know what the state[2] is referring to since that is the one that is triggering the "notdone" signal for Antv2. Is it the z level of the torso, but this is confusing because new episodes are not triggered when the Ant flips over? |
I think that is because of parameters .2 and 1 the line: |
My bad. The obs just include qpos and qvel which normally adds up to 29 but the [2:] brings it down to 27. So apparently the contact forces are not actually apart of the observations I think. |
Can someone tell me where each observation is in the 27 length vector? Specifically, which are the observations for each of the joints? (i.e. Which ones are the left back leg joints, left front leg joints, etc.) |
Ok, so I was not able to figure it out through the code, but I locked each leg in succession and identified which part of the observation was moving. Here are my results: (Front looking to the right) 1 y 2 z Orient (Torso) 4 y 5 z 6 w joint angles 8 2 Front left leg ankle angle <--- 9 3 Back left leg hip angle <--- 10 4 Back left leg ankle angle <--- 11 5 Back left leg hip angle <--- 12 6 Back right leg ankle angle <--- 13 7 Front right leg hip angle <--- 14 8 Front right leg ankle angle <--- Vel (Torso) 15 x 16 y 17 z Angular Vel (Torso) 19 y 20 z joint vel 22 2 Front left leg ankle angle <--- 23 3 Back left leg hip angle <--- 24 4 Back left leg ankle angle <--- 25 5 Back right leg hip angle <--- 26 6 Back right leg ankle angle <--- 27 7 Front right leg hip angle <--- 28 8 Front right leg ankle angle <--- |
Why are there 29 variables... |
Your index starts from 0. That's why there are totally 29 variables. |
https://atomscott.github.io/football/ruby/2019/06/28/Understanding-the-Environment.html This might be useful. |
I disagree that the input action is torque between +1 and -1. For one, I just did a test and I had one value over 1 (1.7). Plus, I can lock legs by setting them to 0.2. If this was torque, a value of 0.2 should not be locking anything. So it seems like it is position instead. Can someone confirm? |
Nevermind. I agree that it is torque now. It was getting stuck at the boundaries specified in ant.xml |
The ctrl range is limited between -1 and +1 whether the input is. And strictly speaking, the input action is not torque but torque*150, because of gear definition in ant.xml. More information of these actuators' definition can be found in http://mujoco.org/book/XMLreference.html#actuator |
Are you completely serious or are you trolling? If you’re serious, let me be serious and straightforward: your response is complete BS. Of course the thread opener wants to know this if he/she is asking for it. Notice that we lack even an overview of some meta features of the Mujoco environments such as „is the action space discrete or continuous?“, „how many dimensions does it have?“, „is it one-hot-encoded?“, or „which joint am I controlling with dimension x?“ and so on. |
Hi! I am having trouble understanding what you meant by new episodes not being triggered when the ant flips over. I unfortunately do not have access to MuJoCo so I cannot test this myself, but the answer to the following question is critical for me: Does the episode terminate immediately as the ant flips over, or is there a certain period when the ant is flipped over and we are waiting for it to recover? Judging by your very next response it seems to be the latter. Could you please confirm this? |
Hey! I believe the when the ant torso gets closer than some threshold to the ground, the episode immediately restarts. I think you can see this in the ant.py file where z=state[2] |
Got it, thanks! |
Hi everyone! I'm kinda stuck in the same problem: I would like to control a robotic EE in position only, so I wrote in the actuator part of XML code and all the magic stuff needed. The problem I encounter is that during RL the action is sampled in the ctrlrange butI would like to have the whole joint space while keeping a limited action sample. Is that any way to solve this stuff? Thanks! |
Hi @erwincoumans, does your script work for all the gym-mujoco envs? If yes I'd be really interested in it, is it publicly available? |
Hey, we detailed docs for this now here (https://www.gymlibrary.ml/), so I'm going to close this issue |
In case anyone from the future stumbles on this thread. As of December 29, 2023 the following are true: https://www.gymlibrary.ml/ is broken, presumably deprecated in favor of https://gymnasium.farama.org. The new gymnasium docs include tables for the complete action and observation spaces. e.g. Action Space
Observation Space
I say "something like" because a straight copy-paste of the tables from https://gymnasium.farama.org/environments/mujoco/ant/ into the GitHub comment editor did not work, so I asked ChatGPT to convert from HTML to a Markdown compatible table, and that's what I pasted here. There could be translation errors, but I'm not inclined to waste time poring over it, it's close enough to get the main idea that the details that were missing years ago have been filled in. @jkterry1 Thanks to the maintainers for keeping gym alive & documented! |
Do descriptions of different environment's action spaces & observation spaces exist anywhere? For example, with Humanoid-V1 the action space is a 17-D vector that presumably maps to different body parts, but are these numbers torques, angles, etc.? Same goes with the observation space - a brief description of what the 376 dimensions correspond to would be incredibly useful to know.
The text was updated successfully, but these errors were encountered: