Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Made readout of seed possible in env #889

Merged
merged 15 commits into from
Feb 26, 2024

Conversation

MischaPanch
Copy link
Contributor

@MischaPanch MischaPanch commented Jan 23, 2024

Description

The random seed is fundamentally important for various evaluation protocols. Given the flakyness of RL, one
often wants to evaluate or even train per-seed and then aggregate and present the results with custom logic.

The current design of the environment makes it almost impossible to extract the seed. Numpy's Generator object created
by Gymnasium's np_random does not expose it, and the seed is not saved anywhere in the env, despite being passed from the user.

This forces users to save seeds somewhere else, which is especially cumbersome when env rollouts are parallelized (which happens all the time).

With this small change we add a hidden attribute _seed and a read-only property. It's entirely backwards compatible

Fixes # (issue)

Seed of env can't be read out

Type of change

Please delete options that are not relevant.

  • Documentation only change (no code changed)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I think this is a helpful addition for the particular cases that you are talking about.

Does it make sense to change the name to make it more explicit? i.e., _np_random_seed?
As seed feels like a common variable name that we don't want to be accidentally writing over

You need to add a forwarding property in Wrapper with, otherwise, Wrappers will use their internal _seed variable

@property
def seed(self) -> int | None:
     return self.env.seed

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 23, 2024

Thanks for the quick response, I added the property to Wrapper.

A slightly related issue that I noticed is the fact that Env.np_random has a setter. This can cause some unexpected behavior that is even amplified with the new seed property like

Overwriting a user setting np_random to None

env.np_random = None
env.np_random
>>> <Generator>

Overwriting user provided RNG on reset

env.np_random = my_rng
env.reset(seed=3)
env.np_random
>>> `No longer my_rng`

Generator and seed can now be out of sync

env.reset(seed=3)
env.np_random = MyRNG(seed=4)
env.seed
>>> 3

One can surely cook up more buggy behavior related to this. The examples are a bit contrived, but I believe so is setting an rng that would be easily overwritten at reset.

I would propose to remove the setter. It's a breaking change, of course. Intuitively, I wouldn't expect many (if any) users to be affected by this change, but I'm not too informed about this. Let me know what you think, I could look for usages of the setter in Gymnasium code and prepare a follow-up PR

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 23, 2024

As to the naming, personally I think seed is fine :). The kwarg in reset is also called seed, and users might want to use the attr for other seeding, e.g. if they use torch in custom envs. But if you prefer np_random_seed, I could rename it of course.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jan 24, 2024

As to the naming, personally I think seed is fine :). The kwarg in reset is also called seed, and users might want to use the attr for other seeding, e.g. if they use torch in custom envs. But if you prefer np_random_seed, I could rename it of course.

Ahh, yes we are going to need to change the variable name as there use to be an important function called seed was removed. This is why the tests are failing
Therefore, could you change all relevant parts to _np_random_seed and np_random_seed

As for the consistency in what np_random_seed is, I think it should be possible to add code to Env.reset to save the seed and in np_random setter to have np_random_seed as None. This should fix the two cases that you are talking about

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 24, 2024

Ahh, yes we are going to need to change the variable name as there use to be an important function called seed was removed. This is why the tests are failing Therefore, could you change all relevant parts to _np_random_seed and np_random_seed

I renamed the property to np_random_seed

As for the consistency in what np_random_seed is, I think it should be possible to add code to Env.reset to save the seed and in np_random setter to have np_random_seed as None. This should fix the two cases that you are talking about

I think this would only change the inconsistency but not resolve it. Say the user has created a generator with seeding.np_random(3) and passed it explicitly using the setter. Then setting the (possibly already set) seed to None is also incorrect. Unfortunately, there seems to be no reliable way to extract a seed from a generator. If I'm not wrong, there might not even be a seed in the general case.

Since in gymnasium the generators are supposed to always be based on seeds, I'm doubting the utility of the setter.

@pseudo-rnd-thoughts
Copy link
Member

I think this would only change the inconsistency but not resolve it. Say the user has created a generator with seeding.np_random(3) and passed it explicitly using the setter. Then setting the (possibly already set) seed to None is also incorrect. Unfortunately, there seems to be no reliable way to extract a seed from a generator. If I'm not wrong, there might not even be a seed in the general case.

Since in gymnasium the generators are supposed to always be based on seeds, I'm doubting the utility of the setter.

If we define np_random_seed with three states

  1. None = np_random is None and not set yet
  2. positive int = The known seed for np_random set through reset
  3. -1 = Unknown seed for np_random updated using the setter

I think that the setter is critical for the forwarding problem that most of the time users are interfacing with a top level wrapper and not the base environments. Therefore, be able to define the environment's np_random and not the top level is important

Also provided separate setter that exposes the standard mechanism for setting RNGs and seeds (used in reset)
@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 25, 2024

Done, just that instead of -1 I used "unknown", which I think is more telling.

I also added more explicit docstrings.

Since you want to keep the direct setting of np_random, which is probably not recommended, I added a new public setter method set_np_random_through_seed (being a method, it's also forwarded through wrappers). This keeps the code backwards compatible.

If backwards compatibility is not important here, I would still recommend getting rid of the property-based setter in favor of the new one :). If it is important to let the users pass custom generators, the interface could be changed such that they provide a factory/callable mapping a seed to a generator. This factory could then also be used in reset, solving the issue of reset overwriting the user-provided rng in the current code version.

This breaking change could be done in a separate PR, if you think it is suitable. Personally, I would see it as an improvement of the Env interfaces

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jan 29, 2024

@MischaPanch Coming back to this after a couple of days, I'm no longer so certain of the foundation for the PR.

For the two cases where you discussed knowing the seeding is important, evaluation and training using a known seed, I'm not sure why this PR is needed.

If a known seed is wanted, then this can be done with reset(seed=seed). I know for a fact that cleanrl can achieve 100% reproducibility of training / evaluation because they seed all the relevant generators with a known seed at the beginning, e.g., https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py see lines 141 and 170 for examples

Therefore, coming back to this, I'm uncertain of the use cases where the seed is unknown for good reason, i.e., reset(seed) cannot be used.

Secondly, I remembered that some environments, e.g., Atari, have second random number generators (internal to the games) that cannot be accessed. Therefore, knowing the np_random generator seed is not helpful, and only knowing the reset(seed) is the possible method of knowing the seed. Therefore, this PR could cause some users to have unreliable confidence in np_random_seed being true.

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 29, 2024

@pseudo-rnd-thoughts

Thanks for the feedback, I'll dive a bit deeper into why I believe this to be important/useful. The main point is not reproducibility but supporting seed-dependent evaluation protocols, which I believe to be very important. My colleague @bordeauxred recently wrote a short review-style article on some of them (see also references therein). Typical plots that need to be created for these evaluation protocols are something like this

image

Note that I was not in favor of calling the property np_random_seed, instead the seed should control all randomness (and likely does that in atari). The fact that the seed cannot be read off the environment after it has been set in reset (potentially explicitly by a user) is a problem, I believe. Even if you consider the particular use case of evaluation protocols irrelevant, it's unusual that something user-provided cannot be read off at all after being set.

Going further into evaluation protocols - a typical thing would be to collect trajectories for each seed and aggregate them per-seed. Now, without the seed being there, one needs to carefully control how seeds are set for each of the parallel rollout workers, then think about how to map the seeds to the collected logs and to disaggregate them before performing per-seed analysis. Even with a single file approach like in cleanRL that's not entirely trivial. If you use an RL framework that handles parallelization for you, it might be entirely infeasible.

However, if the seed is part of the environment, it would be usually very simple to add it either to the collected trajectories themselves, or at least to easily associate seeds to trajectories in other ways. Think of a run where the seeds are chosen randomly from a pool for each call to reset (for every parallel worker). Without the information of the seed being persisted in the env, one not even has a seed-worker association to disaggregate results, and it's becoming even more cumbersome. All the things I described above are not particularly exotic. It's just the case that currently high-quality evaluation protocols are not yet ubiquituious in RL (which I believe makes applications of RL in practice harder than they need to be)

@MischaPanch
Copy link
Contributor Author

If you disagree with these points, of course the PR can be dropped. I would then probably have to write something like a custom wrapper that performs this functionality.

Btw, another thing along the same lines was that the TimeLimit wrapper (and maybe also other wrappers) equally doesn't expose user-set properties like max_episode_steps, which also made my life harder. I wanted to make a small separate PR that makes it easier for people to read off essential properties of the env after this one, the seed was just the most urgent one :)

Even though I don't like arguments from authority - an example of readout of user-set stuff being seen as important is sklearn, where their interfaces enforce that all things passed to init must be saved as public fields

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the detail and the post, very interesting and important point that is often forgotten in RL as the author says.

Even if you consider the particular use case of evaluation protocols irrelevant, it's unusual that something user-provided cannot be read off at all after being set.

My point is that the evaluation protocols are relevant, rather than the seed should be known before it is set, not after.
For example

# current approach
env.reset(seed=seed_value)

# Proposed approach
env.reset()
seed_value = env.np_random_seed

This is how I understand you wishing to use the variable, which, given the current method of use, I do not understand why you would want/need the second.

An important detail about seeding for numpy is that you do not know the seed after you have sampled the generator. I note that Jax does not have this problem.
For example

import numpy as np 

rng = np.random.RandomState(1)
rng.random()
rng_2 = np.random.RandomState(??)
assert rng == rng_2
# I can't know the seed value to create a new rng that has the same seed as rng after sampling with random

@MischaPanch
Copy link
Contributor Author

The thing is that reset typically happens in many parallel processes, often deep inside an RL framework, where a user won't have access to. So while accessing env.seed in some rollout process and using it is usually easy, controlling inside a framework how exactly the seed is set at reset might not be possible at all

@MischaPanch
Copy link
Contributor Author

So a more explicit example, imagine the framework can give you access to workers. You can typically call something like

trajectory_seeds = [w.env.seed for w in rollout_workers]

But often you have no control over what seed was passed and how it was done when the env was instantiated or reset inside a worker process

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 29, 2024

One more example, very close to home. I am working on the RL library tianshou. There, when a user sets training_seed=k, what will actually happen in the n rollout workers is that the workers will be assigned seeds k, k+1, ..., k+n. The user cannot influence this. Now fortunately, I am admin of tianshou and will be changing that soon. But a typical user can't do this, and other frameworks will have the same or similar problems. All this goes to saying that your proposed method number 1, which is "just remember how the seed was set in the user code" may often be infeasible, whereas the method number 2 is essentially always possible

@pseudo-rnd-thoughts
Copy link
Member

@RedTachyon Would you be able to check this thread to see if I'm missing anything

All this goes to saying that your proposed method number 1, which is "just remember how the seed was set in the user code" may often be infeasible, whereas the method number 2 is essentially always possible

@MischaPanch I think this is a good point, that this PR should be secondary and a backup to reset(seed=seed). Therefore I would want to simplify the PR personally.

As -1 is not a valid seed for numpy, then I think it is fine to have np_random_seed: int | None and remove the Literal["unknown"] otherwise for type checking this is significantly more annoying checking each time is none.

Second, for the getter, I'm not sure we need to set the variable if it hasn't already and we should just return int | None

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 29, 2024

I can change to -1, I just thought "unknown" would be more explicit. Pls confirm that you're sure about the -1, then I adjust it.

As to the getter - I did it to be consistent with the getter of np_random. If I don't set it when it's called, you would weirdly get

env = MyEnv(...)
env.np_random_seed
>>> None
env.np_random
>>> <Generator>
env.np_random_seed
>>> Some not-None seed

I usually don't expect attribute access to modify the state, so with the current getter for the seed it seems more benign to me

env = MyEnv(...)
env.np_random_seed
>>> Some seed
env.np_random
>>> <Generator> for that seed

If you want the getter to deliver None, I will adjust the docstring accordingly. Happy to hear that you want to see this merge, let me know how you decide about the final adjustments

@pseudo-rnd-thoughts
Copy link
Member

I can change to -1, I just thought "unknown" would be more explicit. Pls confirm that you're sure about the -1, then I adjust it.

import numpy as np

np.random.RandomState(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mtrand.pyx", line 185, in numpy.random.mtrand.RandomState.__init__
  File "_mt19937.pyx", line 166, in numpy.random._mt19937.MT19937._legacy_seeding
  File "_mt19937.pyx", line 180, in numpy.random._mt19937.MT19937._legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1

For the getter, I get what you mean, I guess it depends on the order of get np_random_seed or np_random first for the info.
Maybe it is ok to keep it then

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 29, 2024

I meant that you prefer it to "unknown", I knew that numpy would fail. The error is just a bit less telling, with "unknown" the user would likely get a better error message and it would be clearer that there are three possibilities (set to int, not set, and set somehow but unknown) IMO

ValueError: Seed must be between 0 and 2**32 - 1

But given that you prefer -1, I'm adjusting it now. If you are convinced by my "explicitness" argument, feel free to drop the last commit before merging :)

I'm leaving the getter

@MischaPanch
Copy link
Contributor Author

Done

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes

Could you remove set_np_random_through_seed and in docs/api/core could you add np_random_seed as a method such that it shows up in the documentation website. Then for me, the PR is good

Then @RedTachyon is going to check that I haven't missed anything, then we can merge

Also added the new property to docs
@MischaPanch
Copy link
Contributor Author

MischaPanch commented Jan 29, 2024

Done. There is no docs/api/core but I think I added it to the right places.

IMPORTANT:

I also added it to VectorEnv to make things consistent. However, I believe some things there are broken

  1. The reset in VectorEnv reads
    def reset(
        self,
        *,
        seed: int | list[int] | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[ObsType, dict[str, Any]]:  # type: ignore
        """Reset all parallel environments and return a batch of initial observations and info.

        Args:
            seed: The environment reset seeds
            options: If to return the options

But the implementation calls seeding.np_random(seed) which doesn't support list[int]

  1. The VectorWrapper does not implement np_random, contrary to Wrapper. Maybe it's not necessary, but I guess then it's also not necessary in Wrapper. It's probably a separate issue

Also added the new property to docs
@MischaPanch
Copy link
Contributor Author

I fixed the docstring and type of VectorEnv and added properties to Sync/Async VectorEnvs. I also slightly improved the assert messages and fixed minor typing problems.

NB: just an observation - there's quite some code duplication between the sync and async versions. I guess it's by design?

Let me know if there are more changes to be made :)

@MischaPanch
Copy link
Contributor Author

@pseudo-rnd-thoughts what's the status here? :)

Is there anything preventing a merge? Do you have a timeline on when @RedTachyon could review it, in case that's necessary?

@pseudo-rnd-thoughts
Copy link
Member

@MischaPanch I will poke @RedTachyon again but the code looks good imo

Just for the future, could you add several tests for each of the cases (-1, pos int and None) in tests/test_core.py and very basic tests in the respective vector tests

@MischaPanch
Copy link
Contributor Author

Just for the future, could you add several tests for each of the cases (-1, pos int and None) in tests/test_core.py and very basic tests in the respective vector tests

Will do :)

@MischaPanch
Copy link
Contributor Author

Sorry, was swamped in the last weeks. But now I added the tests (and a minor fix)

@pseudo-rnd-thoughts
Copy link
Member

@MischaPanch I can't edit the PR, could you run pre-commit, pip install pre-commit then pre-commit run --all-files
Run it until all of the hooks pass.
Also, could you remove the testing class in test_core.py as this is not the style of testing that we use for the rest of the project

@Kallinteris-Andreas
Copy link
Collaborator

Should check_reset_determinism be updated to check the seed?

@MischaPanch
Copy link
Contributor Author

MischaPanch commented Feb 23, 2024

I had some cryptic errors in pre-commit hooks (pkg_resources not found). Could you share the development setup or Gymnasium with me? Or maybe put it in contributing.md
@pseudo-rnd-thoughts

I will remove the test class, though personally I really like them - you can bundle related tests together and the test logging output looks nicer

@Kallinteris-Andreas
Copy link
Collaborator

@MischaPanch you can use GitHub codespaces to run pre-commit in, (that is what I do when I encounter problems locally)

Michael Panchenko added 2 commits February 23, 2024 17:50
@MischaPanch
Copy link
Contributor Author

check_reset_determinism

I made a fulltext search for this and didn't find anything @Kallinteris-Andreas

@MischaPanch
Copy link
Contributor Author

@pseudo-rnd-thoughts Reformatted and removed the test class, should be good to go

Copy link
Member

@RedTachyon RedTachyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, left a few comments.

Overall I had some doubts about adding this at all, but I managed to covince myself in the meantime, so I don't have to bother you with it.

At the same time, as this goes into the very core API, we need to be careful even with tiny details, so that's mostly what I addressed. Might have some more thoughts later.

@@ -66,6 +66,8 @@ class Env(Generic[ObsType, ActType]):

# Created
_np_random: np.random.Generator | None = None
# will be set to the "invalid" value -1 if the seed of the currently set rng is unknown
_np_random_seed: int | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be better with one consistent way of specifying an invalid state - either None or -1, not both. Unless there's a clear distinction between what None and -1 mean?

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, -1 means unknown seed, while None means that the seed hasn't been set, we could use -2 for one of the cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those two semantically meaningfully different, keeping in mind the lazy initialization? Like is there a situation where the disambiguation provides useful information?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way for np_random_seed to be -1 is if the users sets the env.np_random = new_generator, the new generator's seed is unknown.
The problem is that we can't set the return type of np_random_seed to int as it might be called before np_random then it would be None.
However, this case should never happen in practice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is int, like np_random itself, the seed is set if it was None before.

The user will never get None, but we need to instantiate to None in the constructor. So the semantics of None is "not yet set", just like for np_random itself

return self.get_attr("np_random_seed")

@property
def np_random(self) -> tuple[np.random.Generator, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a good idea to expose, since it will (I think) be a copy of the generator, not the same object. For example, you might be tempted to do envs.np_random.do_something() and expect that it will affect the original vector env - but it won't.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick testing showed that sync has the same variable ids but async did not as you guessed.
Could we add a warning to users? But this is true of all variables gathered from call

Copy link
Contributor Author

@MischaPanch MischaPanch Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not good to expose it, especially the setter - I argued for it in a previous thread. But removing them would be a breaking change, rather outside if the scope of this PR. I just made the vector env interfaces consistent with the base env ones

Concerning other fixes to the vector envy - I'd prefer to do them in a separate PR.

Generally, it seems like the asynchronous discussion is going into a bit of a circle. If you want, we could have a short call to finalize this and discuss possible follow-ups in the vector envs.

I'd also be happy to discuss further integration of Gymnasium and other farama things with tianshou, if you want

@@ -430,6 +446,19 @@ def render_mode(self) -> tuple[RenderFrame, ...] | None:
"""Returns the `render_mode` from the base environment."""
return self.env.render_mode

@property
def np_random(self) -> np.random.Generator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need/want this? I don't remember what the general philosophy was regarding wrappers having their own np_random, but quietly (and unavoidably) passing on the unwrapped env's generator seems rather risky and unnecessarily opaque to me

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current approach for Env, such that wrappers technically have no np_random, it is always the unwrapped envs.
Otherwise, you can get the same confusion when setting a np_random as this doesn't update the unwrapped environnments
This has been partially fixed with removing __getattr__ but this could still cause confusion I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name shadowing in the wrappers is among the most confusing things I've ever seen, I remember spending hours searching for bugs due to this in early gym versions. I'm happy it's gone, the wrappers should never have name-shadowing attributes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RedTachyon Has brought up good points to consider but I don't view any as major that prevent this PR being merged.
However if the conversation continues and we spot issues we can modify the implementation

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit b3f0361 into Farama-Foundation:main Feb 26, 2024
11 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants