Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Question] Is the length of trajectory (episode) controlled by the done in step() function? #814

Closed
2 tasks done
YimengZhang94 opened this issue Mar 9, 2022 · 2 comments
Labels
question Further information is requested

Comments

@YimengZhang94
Copy link

YimengZhang94 commented Mar 9, 2022

Question

In RL, a trajectory (or episode) is a sequence of states and actions in the world, see the following link for more explanations:
https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
In SB3, is the length of the trajectory (episode) controlled by the done in step() function?
If I have an infinite horizon, then the done is always False during all total_timesteps in the model.learn() function, right? If I have a finite horizon, I just need to set the done = True when one trajectory (episode) ends and the model.learn() function will identify it automatically, right? Thanks in advance.

Checklist

  • I have read the documentation (required)
  • I have checked that there is no similar issue in the repo (required)
@YimengZhang94 YimengZhang94 added the question Further information is requested label Mar 9, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Mar 10, 2022

Yes, exactly! SB3 follows the definitions of Gym, where indeed done=True means end of an episode. Note that the rollouts (collecting samples) does not depend on episode lengths, unless you explicitly set it so in some of the algorithms.

Closing as resolved. Please refer to the docs for further information :)

@Miffyli Miffyli closed this as completed Mar 10, 2022
@araffin
Copy link
Member

araffin commented Mar 10, 2022

Hello,

For infinite horizon, I recommend you taking a look at #284 and #633 .
One way to deal with that is having a timeout (setting done=True) but telling the agent to ignore that termination and treat the problem as infinite horizon by providing info["TimeLimit.truncated"] = True (done automatically by the TimeLimit wrapper).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants