Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Running Demos #19

Open
MukundVarmaT opened this issue Mar 11, 2023 · 7 comments
Open

Running Demos #19

MukundVarmaT opened this issue Mar 11, 2023 · 7 comments

Comments

@MukundVarmaT
Copy link

Hi @juliagsy

I wanted to check if the optimize_policy can reproduce the results indicated by the output gif in the README.

Thanks

@juliagsy
Copy link
Contributor

Hello! Apologies for the late reply! This slipped under the radar somehow!

They should be reproducible since the function hasn't changed, except for issues related to the tests as they're being fixed at the moment! But in general they should be reproducible!

Another point is that as we are currently incorporating the ivy compiler API in this repo as well, an API key may be required to use certain functions, but more information on this will be made available a bit later when this is ready in the very near future!

Thanks!
Julia

@MukundVarmaT
Copy link
Author

MukundVarmaT commented Mar 21, 2023

I wasn't able to set up the environment and hence attempted to reproduce the environment on gym, along with a similar training code using pytorch. The model does seem to get better but is nowhere close to the results shown on the README. (Please find attached for your reference)

ezgif com-video-to-gif (3)

It looks like the model seems to be doing the right thing, but not as well as to hold the pole upright for a while.

@juliagsy
Copy link
Contributor

How many iterations is this?

@MukundVarmaT
Copy link
Author

10,000 iterations. I have used all the default parameters as on the repository.

Separate question that I wanted to clarify, you are using the same initial state during training? https://github.com/unifyai/gym/blob/master/ivy_gym_demos/optimization/optimize_policy.py#L26

@juliagsy
Copy link
Contributor

juliagsy commented Mar 21, 2023

they will be randomised initial states, cuz we need to run an episode with n steps (in this case n=100) in each iteration (a run through in an iteration for collecting data of the movement of the cartpole), thus, when the iteration start, this is also the start of the episode, therefore we are using the initial state, the state is then updated as the run progresses. after stepping through n=100 in an iteration of training, the loss is obtained and stepping / optimisation is applied to the model. we then progress to the next iteration, and the procedure repeats

@MukundVarmaT
Copy link
Author

Basically the score increases to 40-43 very quickly in < 1000 steps, then saturates and does not increase anymore.

Any help would be greatly appreciated.

@juliagsy
Copy link
Contributor

juliagsy commented Apr 8, 2023

Hey! Apologies for the delayed response! I took some time to investigate the environments! I actually do need to correct my last comment (which I also confused myself about), so I actually overlooked that the initial state has to be randomly sampled during the start of each episode, so it is of the same range but distributed uniformly, therefore isn't exactly identical! I've also pushed a small fix, hope it's working as expected now (I'm looking at the scores and there's improvements, I'm unable to visualise it atm unfortunately), let me know if any further help is needed! Apologies again about the confusion!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants