-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fixed new gym API related to step() and reset() #87
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, Im one of the developers of Gym so thanks for starting this PR
I have added a couple of changes for updating the API
This page includes a migration guide, https://gymnasium.farama.org/content/migration-guide/
I couldn't see a function like this, but I would add a testing function like this in Gym to test that environment follow the API.
action = env.action_space.sample() | ||
_, reward, done, info = env.step(action) | ||
_, reward, done, _, info = env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with
_, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
@@ -243,7 +255,7 @@ def seed(self, seed=None): | |||
# return the list of seeds used by RNG(s) in the environment | |||
return [seed] | |||
|
|||
def reset(self, seed=None, options=None, return_info=None): | |||
def reset(self, seed=None, options=None, return_info=None) -> Tuple[ObsType, dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove return_info
parameter
@@ -352,7 +371,7 @@ def close(self): | |||
if self.viewer is not None: | |||
self.viewer.close() | |||
|
|||
def render(self, mode='human'): | |||
def render(self, mode='human') -> Optional[Union[RenderFrame, List[RenderFrame]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove mode parameter and add render_mode
to __init__
for specifying the type of rendering
action = env.action_space.sample() | ||
_, _, done, _ = env.step(action) | ||
_, _, done, _, _ = env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace done with terminated and truncated as in the comment before
action = envs[idx].action_space.sample() | ||
_, _, dones[idx], _ = envs[idx].step(action) | ||
_, _, dones[idx], _, _ = envs[idx].step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
# check each output | ||
state, reward, done, info = output | ||
state, reward, done, truncated, info = output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terminated, truncated
, as terminated != done
done = False | ||
state, _, done, _ = env.step(0) | ||
state, _, done, _, _ = env.step(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on terminated and truncated
@@ -120,9 +121,9 @@ def test(self): | |||
if done: | |||
state = env.reset() | |||
done = False | |||
state, _, done, _ = env.step(0) | |||
state, _, done, _, _ = env.step(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on terminated and truncated
done = False | ||
else: | ||
state, reward, done, info = env.step(env.action_space.sample()) | ||
state, reward, done, truncated, info = env.step(env.action_space.sample()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on terminated and truncated
@@ -37,7 +37,7 @@ | |||
|
|||
setup( | |||
name='nes_py', | |||
version='8.2.1', | |||
version='8.2.2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point, but I would make this a minor or major release due to the significant code changes
Any reason why this shouldn't be merged with the suggested edits? |
Description
There are several issues related to changes in API. I re-adjusted all code, so it returned truncated in step and readjusted reset to match what the gym expected. In parallel, fixed Mario env. Note old code won't work with this since GYM now return five elements in a tuple vs. 4. Same for reset. I also re-adjusted all the unit tests and examples, so it matched and passed all tests.
Type of change
Please select all relevant options:
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Test Configuration
Target: x86_64-apple-darwin21.6.0
Thread model: posix
Checklist