Skip to content

Commit

Permalink
Fix and Update Basic Usage's and Core page (#41)
Browse files Browse the repository at this point in the history
  • Loading branch information
vairodp authored Oct 10, 2022
1 parent 6273299 commit c2e2df2
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 27 deletions.
30 changes: 19 additions & 11 deletions docs/content/basic_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Initializing environments is very easy in Gymnasium and can be done via:

```python
import gymnasium as gym
env = gym.make('CartPole-v0')
env = gym.make('CartPole-v1')
```

## Interacting with the Environment
Expand All @@ -38,7 +38,7 @@ alongside the observation for this timestep. The reward may also be negative or
The agent will then be trained to maximize the reward it accumulates over many timesteps.

After some timesteps, the environment may enter a terminal state. For instance, the robot may have crashed, or the agent may have succeeded in completing a task. In that case, we want to reset the environment to a new initial state. The environment issues a terminated signal to the agent if it enters such a terminal state. Sometimes we also want to end the episode after a fixed number of timesteps, in this case, the environment issues a truncated signal.
This is a new change in API (v0.26 onwards). Earlier a common done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.
This is a new change in API (v0.26 onwards). Earlier a commonly done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.

Let's see what the agent-environment loop looks like in Gymnasium.
This example will run an instance of `LunarLander-v2` environment for 1000 timesteps. Since we pass `render_mode="human"`, you should see a window pop up rendering the environment.
Expand All @@ -60,7 +60,7 @@ for _ in range(1000):
env.close()
```

The output should look something like this
The output should look something like this:

```{figure} https://user-images.githubusercontent.com/15806078/153222406-af5ce6f0-4696-4a24-a683-46ad4939170c.gif
:width: 50%
Expand Down Expand Up @@ -93,8 +93,8 @@ env = gym.make("CartPole-v1", apply_api_compatibility=True)
```
This can also be done explicitly through a wrapper:
```python
from gymasium.wrappers import StepCompatibility
env = StepCompatibility(CustomEnv(), output_truncation_bool=False)
from gymnasium.wrappers import StepAPICompatibility
env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False)
```
For more details see the wrappers section.

Expand Down Expand Up @@ -131,7 +131,8 @@ There are multiple `Space` types available in Gymnasium:

```python
>>> from gymnasium.spaces import Box, Discrete, Dict, Tuple, MultiBinary, MultiDiscrete
>>>
>>> import numpy as np
>>>
>>> observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32)
>>> observation_space.sample()
[ 1.6952509 -0.4399011 -0.7981693]
Expand Down Expand Up @@ -217,7 +218,7 @@ play(gymnasium.make('Pong-v0'))
This opens a window of the environment and allows you to control the agent using your keyboard.

Playing using the keyboard requires a key-action map. This map should have type `dict[tuple[int], int | None]`, which maps the keys pressed to action performed.
For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like:
For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like this:
```python
{
# ...
Expand All @@ -230,16 +231,23 @@ As a more complete example, let's say we wish to play with `CartPole-v0` using o
import gymnasium as gym
import pygame
from gymnasium.utils.play import play

mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
play(gymnasium.make("CartPole-v0"), keys_to_action=mapping)
play(gym.make("CartPole-v1",render_mode="rgb_array"), keys_to_action=mapping)
```
where we obtain the corresponding key ID constants from pygame. If the `key_to_action` argument is not specified, then the default `key_to_action` mapping for that env is used, if provided.

Furthermore, if you wish to plot real time statistics as you play, you can use `gymnasium.utils.play.PlayPlot`. Here's some sample code for plotting the reward for last 5 second of gameplay:
```python
import gymnasium as gym
import pygame
from gymnasium.utils.play import PlayPlot, play

def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):
return [rew,]
return [rew, ]

plotter = PlayPlot(callback, 30 * 5, ["reward"])
env = gymnasium.make("Pong-v0")
play(env, callback=plotter.callback)
mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
env = gym.make("CartPole-v1", render_mode="rgb_array")
play(env, callback=plotter.callback, keys_to_action=mapping)
```
24 changes: 8 additions & 16 deletions gymnasium/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class Env(Generic[ObsType, ActType]):
- :attr:`action_space` - The Space object corresponding to valid actions
- :attr:`observation_space` - The Space object corresponding to valid observations
- :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards
- :attr:`spec` - An environment spec that contains the information used to initialise the environment from `gym.make`
- :attr:`spec` - An environment spec that contains the information used to initialize the environment from `gymnasium.make`
- :attr:`metadata` - The metadata of the environment, i.e. render modes
- :attr:`np_random` - The random number generator for the environment
Expand All @@ -74,7 +74,7 @@ class Env(Generic[ObsType, ActType]):

@property
def np_random(self) -> np.random.Generator:
"""Returns the environment's internal :attr:`_np_random` that if not set will initialise with a random seed."""
"""Returns the environment's internal :attr:`_np_random` that if not set will initialize with a random seed."""
if self._np_random is None:
self._np_random, seed = seeding.np_random()
return self._np_random
Expand All @@ -99,17 +99,13 @@ def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:
terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
In this case further step() calls could return undefined results.
truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
Typically a timelimit, but could also be used to indicate an agent physically going out of bounds.
Can be used to end the episode prematurely before a `terminal state` is reached.
info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the agent's performance state, variables that are
hidden from observations, or individual reward terms that are combined to produce the total reward.
It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
It also can contain information that distinguishes truncation and termination, however, this is deprecated in favor
of returning two booleans, and will be removed in a future version.
done (bool): (Deprecated) A boolean value for if the episode has ended, in which case further :meth:`step` calls will
return undefined results.
A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,
a certain timelimit was exceeded, or the physics simulation has entered an invalid state.
"""
raise NotImplementedError

Expand Down Expand Up @@ -175,19 +171,15 @@ def render(self) -> Optional[Union[RenderFrame, List[RenderFrame]]]:
raise NotImplementedError

def close(self):
"""Override close in your subclass to perform any necessary cleanup.
Environments will automatically :meth:`close()` themselves when
garbage collected or when the program exits.
"""
"""Override close in your subclass to perform any necessary cleanup."""
pass

@property
def unwrapped(self) -> "Env":
"""Returns the base non-wrapped environment.
Returns:
Env: The base non-wrapped gym.Env instance
Env: The base non-wrapped gymnasium.Env instance
"""
return self

Expand Down Expand Up @@ -349,7 +341,7 @@ class ObservationWrapper(Wrapper):
"""Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`.
If you would like to apply a function to the observation that is returned by the base environment before
passing it to learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
passing it to the learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
:meth:`observation` to implement that transformation. The transformation defined in that method must be
defined on the base environment’s observation space. However, it may take values in a different space.
In that case, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space`
Expand Down Expand Up @@ -401,7 +393,7 @@ class RewardWrapper(Wrapper):
because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
To do that, we could, for instance, implement the following wrapper::
class ClipReward(gymnasium.RewardWrapper):
class ClipReward(gym.RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
Expand Down

0 comments on commit c2e2df2

Please # to comment.