-
-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Made readout of seed possible in env #889
Made readout of seed possible in env #889
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I think this is a helpful addition for the particular cases that you are talking about.
Does it make sense to change the name to make it more explicit? i.e., _np_random_seed
?
As seed
feels like a common variable name that we don't want to be accidentally writing over
You need to add a forwarding property in Wrapper
with, otherwise, Wrappers will use their internal _seed
variable
@property
def seed(self) -> int | None:
return self.env.seed
Thanks for the quick response, I added the property to Wrapper. A slightly related issue that I noticed is the fact that Overwriting a user setting np_random to Noneenv.np_random = None
env.np_random
>>> <Generator> Overwriting user provided RNG on resetenv.np_random = my_rng
env.reset(seed=3)
env.np_random
>>> `No longer my_rng` Generator and seed can now be out of syncenv.reset(seed=3)
env.np_random = MyRNG(seed=4)
env.seed
>>> 3 One can surely cook up more buggy behavior related to this. The examples are a bit contrived, but I believe so is setting an rng that would be easily overwritten at reset. I would propose to remove the setter. It's a breaking change, of course. Intuitively, I wouldn't expect many (if any) users to be affected by this change, but I'm not too informed about this. Let me know what you think, I could look for usages of the setter in Gymnasium code and prepare a follow-up PR |
As to the naming, personally I think seed is fine :). The kwarg in reset is also called seed, and users might want to use the attr for other seeding, e.g. if they use torch in custom envs. But if you prefer |
Ahh, yes we are going to need to change the variable name as there use to be an important function called As for the consistency in what |
I renamed the property to
I think this would only change the inconsistency but not resolve it. Say the user has created a generator with Since in gymnasium the generators are supposed to always be based on seeds, I'm doubting the utility of the setter. |
If we define
I think that the setter is critical for the forwarding problem that most of the time users are interfacing with a top level wrapper and not the base environments. Therefore, be able to define the environment's |
Also provided separate setter that exposes the standard mechanism for setting RNGs and seeds (used in reset)
Done, just that instead of I also added more explicit docstrings. Since you want to keep the direct setting of If backwards compatibility is not important here, I would still recommend getting rid of the property-based setter in favor of the new one :). If it is important to let the users pass custom generators, the interface could be changed such that they provide a factory/callable mapping a seed to a generator. This factory could then also be used in reset, solving the issue of reset overwriting the user-provided rng in the current code version. This breaking change could be done in a separate PR, if you think it is suitable. Personally, I would see it as an improvement of the Env interfaces |
@MischaPanch Coming back to this after a couple of days, I'm no longer so certain of the foundation for the PR. For the two cases where you discussed knowing the seeding is important, evaluation and training using a known seed, I'm not sure why this PR is needed. If a known seed is wanted, then this can be done with Therefore, coming back to this, I'm uncertain of the use cases where the seed is unknown for good reason, i.e., Secondly, I remembered that some environments, e.g., Atari, have second random number generators (internal to the games) that cannot be accessed. Therefore, knowing the |
Thanks for the feedback, I'll dive a bit deeper into why I believe this to be important/useful. The main point is not reproducibility but supporting seed-dependent evaluation protocols, which I believe to be very important. My colleague @bordeauxred recently wrote a short review-style article on some of them (see also references therein). Typical plots that need to be created for these evaluation protocols are something like this Note that I was not in favor of calling the property Going further into evaluation protocols - a typical thing would be to collect trajectories for each seed and aggregate them per-seed. Now, without the seed being there, one needs to carefully control how seeds are set for each of the parallel rollout workers, then think about how to map the seeds to the collected logs and to disaggregate them before performing per-seed analysis. Even with a single file approach like in cleanRL that's not entirely trivial. If you use an RL framework that handles parallelization for you, it might be entirely infeasible. However, if the seed is part of the environment, it would be usually very simple to add it either to the collected trajectories themselves, or at least to easily associate seeds to trajectories in other ways. Think of a run where the seeds are chosen randomly from a pool for each call to reset (for every parallel worker). Without the information of the seed being persisted in the env, one not even has a seed-worker association to disaggregate results, and it's becoming even more cumbersome. All the things I described above are not particularly exotic. It's just the case that currently high-quality evaluation protocols are not yet ubiquituious in RL (which I believe makes applications of RL in practice harder than they need to be) |
If you disagree with these points, of course the PR can be dropped. I would then probably have to write something like a custom wrapper that performs this functionality. Btw, another thing along the same lines was that the TimeLimit wrapper (and maybe also other wrappers) equally doesn't expose user-set properties like Even though I don't like arguments from authority - an example of readout of user-set stuff being seen as important is sklearn, where their interfaces enforce that all things passed to init must be saved as public fields |
Thanks for the detail and the post, very interesting and important point that is often forgotten in RL as the author says.
My point is that the evaluation protocols are relevant, rather than the seed should be known before it is set, not after. # current approach
env.reset(seed=seed_value)
# Proposed approach
env.reset()
seed_value = env.np_random_seed This is how I understand you wishing to use the variable, which, given the current method of use, I do not understand why you would want/need the second. An important detail about seeding for numpy is that you do not know the seed after you have sampled the generator. I note that Jax does not have this problem. import numpy as np
rng = np.random.RandomState(1)
rng.random()
rng_2 = np.random.RandomState(??)
assert rng == rng_2
# I can't know the seed value to create a new rng that has the same seed as rng after sampling with random |
The thing is that reset typically happens in many parallel processes, often deep inside an RL framework, where a user won't have access to. So while accessing |
So a more explicit example, imagine the framework can give you access to workers. You can typically call something like
But often you have no control over what seed was passed and how it was done when the env was instantiated or reset inside a worker process |
One more example, very close to home. I am working on the RL library tianshou. There, when a user sets |
@RedTachyon Would you be able to check this thread to see if I'm missing anything
@MischaPanch I think this is a good point, that this PR should be secondary and a backup to As Second, for the getter, I'm not sure we need to set the variable if it hasn't already and we should just return |
I can change to -1, I just thought "unknown" would be more explicit. Pls confirm that you're sure about the -1, then I adjust it. As to the getter - I did it to be consistent with the getter of env = MyEnv(...)
env.np_random_seed
>>> None
env.np_random
>>> <Generator>
env.np_random_seed
>>> Some not-None seed I usually don't expect attribute access to modify the state, so with the current getter for the seed it seems more benign to me env = MyEnv(...)
env.np_random_seed
>>> Some seed
env.np_random
>>> <Generator> for that seed If you want the getter to deliver None, I will adjust the docstring accordingly. Happy to hear that you want to see this merge, let me know how you decide about the final adjustments |
import numpy as np
np.random.RandomState(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "mtrand.pyx", line 185, in numpy.random.mtrand.RandomState.__init__
File "_mt19937.pyx", line 166, in numpy.random._mt19937.MT19937._legacy_seeding
File "_mt19937.pyx", line 180, in numpy.random._mt19937.MT19937._legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1 For the getter, I get what you mean, I guess it depends on the order of get |
I meant that you prefer it to "unknown", I knew that numpy would fail. The error is just a bit less telling, with "unknown" the user would likely get a better error message and it would be clearer that there are three possibilities (set to int, not set, and set somehow but unknown) IMO
But given that you prefer -1, I'm adjusting it now. If you are convinced by my "explicitness" argument, feel free to drop the last commit before merging :) I'm leaving the getter |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes
Could you remove set_np_random_through_seed
and in docs/api/core
could you add np_random_seed
as a method such that it shows up in the documentation website. Then for me, the PR is good
Then @RedTachyon is going to check that I haven't missed anything, then we can merge
Also added the new property to docs
Done. There is no IMPORTANT: I also added it to
def reset(
self,
*,
seed: int | list[int] | None = None,
options: dict[str, Any] | None = None,
) -> tuple[ObsType, dict[str, Any]]: # type: ignore
"""Reset all parallel environments and return a batch of initial observations and info.
Args:
seed: The environment reset seeds
options: If to return the options But the implementation calls
|
Also added the new property to docs
I fixed the docstring and type of VectorEnv and added properties to Sync/Async VectorEnvs. I also slightly improved the assert messages and fixed minor typing problems. NB: just an observation - there's quite some code duplication between the sync and async versions. I guess it's by design? Let me know if there are more changes to be made :) |
@pseudo-rnd-thoughts what's the status here? :) Is there anything preventing a merge? Do you have a timeline on when @RedTachyon could review it, in case that's necessary? |
@MischaPanch I will poke @RedTachyon again but the code looks good imo Just for the future, could you add several tests for each of the cases (-1, pos int and None) in |
Will do :) |
Sorry, was swamped in the last weeks. But now I added the tests (and a minor fix) |
@MischaPanch I can't edit the PR, could you run |
Should |
I had some cryptic errors in pre-commit hooks (pkg_resources not found). Could you share the development setup or Gymnasium with me? Or maybe put it in contributing.md I will remove the test class, though personally I really like them - you can bundle related tests together and the test logging output looks nicer |
@MischaPanch you can use GitHub codespaces to run |
I made a fulltext search for this and didn't find anything @Kallinteris-Andreas |
@pseudo-rnd-thoughts Reformatted and removed the test class, should be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, left a few comments.
Overall I had some doubts about adding this at all, but I managed to covince myself in the meantime, so I don't have to bother you with it.
At the same time, as this goes into the very core API, we need to be careful even with tiny details, so that's mostly what I addressed. Might have some more thoughts later.
@@ -66,6 +66,8 @@ class Env(Generic[ObsType, ActType]): | |||
|
|||
# Created | |||
_np_random: np.random.Generator | None = None | |||
# will be set to the "invalid" value -1 if the seed of the currently set rng is unknown | |||
_np_random_seed: int | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be better with one consistent way of specifying an invalid state - either None or -1, not both. Unless there's a clear distinction between what None and -1 mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, -1
means unknown seed, while None
means that the seed hasn't been set, we could use -2
for one of the cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those two semantically meaningfully different, keeping in mind the lazy initialization? Like is there a situation where the disambiguation provides useful information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way for np_random_seed
to be -1 is if the users sets the env.np_random = new_generator
, the new generator's seed is unknown.
The problem is that we can't set the return type of np_random_seed
to int
as it might be called before np_random
then it would be None
.
However, this case should never happen in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type is int, like np_random itself, the seed is set if it was None before.
The user will never get None, but we need to instantiate to None in the constructor. So the semantics of None is "not yet set", just like for np_random itself
return self.get_attr("np_random_seed") | ||
|
||
@property | ||
def np_random(self) -> tuple[np.random.Generator, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good idea to expose, since it will (I think) be a copy of the generator, not the same object. For example, you might be tempted to do envs.np_random.do_something()
and expect that it will affect the original vector env - but it won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick testing showed that sync has the same variable ids but async did not as you guessed.
Could we add a warning to users? But this is true of all variables gathered from call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not good to expose it, especially the setter - I argued for it in a previous thread. But removing them would be a breaking change, rather outside if the scope of this PR. I just made the vector env interfaces consistent with the base env ones
Concerning other fixes to the vector envy - I'd prefer to do them in a separate PR.
Generally, it seems like the asynchronous discussion is going into a bit of a circle. If you want, we could have a short call to finalize this and discuss possible follow-ups in the vector envs.
I'd also be happy to discuss further integration of Gymnasium and other farama things with tianshou, if you want
@@ -430,6 +446,19 @@ def render_mode(self) -> tuple[RenderFrame, ...] | None: | |||
"""Returns the `render_mode` from the base environment.""" | |||
return self.env.render_mode | |||
|
|||
@property | |||
def np_random(self) -> np.random.Generator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need/want this? I don't remember what the general philosophy was regarding wrappers having their own np_random, but quietly (and unavoidably) passing on the unwrapped env's generator seems rather risky and unnecessarily opaque to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the current approach for Env
, such that wrappers technically have no np_random
, it is always the unwrapped envs.
Otherwise, you can get the same confusion when setting a np_random
as this doesn't update the unwrapped environnments
This has been partially fixed with removing __getattr__
but this could still cause confusion I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name shadowing in the wrappers is among the most confusing things I've ever seen, I remember spending hours searching for bugs due to this in early gym versions. I'm happy it's gone, the wrappers should never have name-shadowing attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RedTachyon Has brought up good points to consider but I don't view any as major that prevent this PR being merged.
However if the conversation continues and we spot issues we can modify the implementation
Description
The random seed is fundamentally important for various evaluation protocols. Given the flakyness of RL, one
often wants to evaluate or even train per-seed and then aggregate and present the results with custom logic.
The current design of the environment makes it almost impossible to extract the seed. Numpy's
Generator
object createdby Gymnasium's
np_random
does not expose it, and the seed is not saved anywhere in the env, despite being passed from the user.This forces users to save seeds somewhere else, which is especially cumbersome when env rollouts are parallelized (which happens all the time).
With this small change we add a hidden attribute
_seed
and a read-only property. It's entirely backwards compatibleFixes # (issue)
Seed of env can't be read out
Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)