-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Running Demos #19
Comments
Hello! Apologies for the late reply! This slipped under the radar somehow! They should be reproducible since the function hasn't changed, except for issues related to the tests as they're being fixed at the moment! But in general they should be reproducible! Another point is that as we are currently incorporating the ivy compiler API in this repo as well, an API key may be required to use certain functions, but more information on this will be made available a bit later when this is ready in the very near future! Thanks! |
I wasn't able to set up the environment and hence attempted to reproduce the environment on gym, along with a similar training code using pytorch. The model does seem to get better but is nowhere close to the results shown on the README. (Please find attached for your reference) It looks like the model seems to be doing the right thing, but not as well as to hold the pole upright for a while. |
How many iterations is this? |
10,000 iterations. I have used all the default parameters as on the repository. Separate question that I wanted to clarify, you are using the same initial state during training? https://github.com/unifyai/gym/blob/master/ivy_gym_demos/optimization/optimize_policy.py#L26 |
they will be randomised initial states, cuz we need to run an episode with n steps (in this case n=100) in each iteration (a run through in an iteration for collecting data of the movement of the cartpole), thus, when the iteration start, this is also the start of the episode, therefore we are using the initial state, the state is then updated as the run progresses. after stepping through n=100 in an iteration of training, the loss is obtained and stepping / optimisation is applied to the model. we then progress to the next iteration, and the procedure repeats |
Basically the score increases to 40-43 very quickly in < 1000 steps, then saturates and does not increase anymore. Any help would be greatly appreciated. |
Hey! Apologies for the delayed response! I took some time to investigate the environments! I actually do need to correct my last comment (which I also confused myself about), so I actually overlooked that the initial state has to be randomly sampled during the start of each episode, so it is of the same range but distributed uniformly, therefore isn't exactly identical! I've also pushed a small fix, hope it's working as expected now (I'm looking at the scores and there's improvements, I'm unable to visualise it atm unfortunately), let me know if any further help is needed! Apologies again about the confusion! |
Hi @juliagsy
I wanted to check if the optimize_policy can reproduce the results indicated by the output gif in the README.
Thanks
The text was updated successfully, but these errors were encountered: