Running Demos #19

MukundVarmaT · 2023-03-11T21:32:32Z

I wanted to check if the optimize_policy can reproduce the results indicated by the output gif in the README.

Thanks

juliagsy · 2023-03-21T18:52:28Z

Hello! Apologies for the late reply! This slipped under the radar somehow!

They should be reproducible since the function hasn't changed, except for issues related to the tests as they're being fixed at the moment! But in general they should be reproducible!

Another point is that as we are currently incorporating the ivy compiler API in this repo as well, an API key may be required to use certain functions, but more information on this will be made available a bit later when this is ready in the very near future!

Thanks!
Julia

MukundVarmaT · 2023-03-21T19:26:22Z

I wasn't able to set up the environment and hence attempted to reproduce the environment on gym, along with a similar training code using pytorch. The model does seem to get better but is nowhere close to the results shown on the README. (Please find attached for your reference)

It looks like the model seems to be doing the right thing, but not as well as to hold the pole upright for a while.

juliagsy · 2023-03-21T20:50:18Z

How many iterations is this?

MukundVarmaT · 2023-03-21T21:19:40Z

10,000 iterations. I have used all the default parameters as on the repository.

Separate question that I wanted to clarify, you are using the same initial state during training? https://github.com/unifyai/gym/blob/master/ivy_gym_demos/optimization/optimize_policy.py#L26

juliagsy · 2023-03-21T21:34:54Z

they will be randomised initial states, cuz we need to run an episode with n steps (in this case n=100) in each iteration (a run through in an iteration for collecting data of the movement of the cartpole), thus, when the iteration start, this is also the start of the episode, therefore we are using the initial state, the state is then updated as the run progresses. after stepping through n=100 in an iteration of training, the loss is obtained and stepping / optimisation is applied to the model. we then progress to the next iteration, and the procedure repeats

MukundVarmaT · 2023-03-25T16:36:02Z

Basically the score increases to 40-43 very quickly in < 1000 steps, then saturates and does not increase anymore.

Any help would be greatly appreciated.

juliagsy · 2023-04-08T20:52:18Z

Hey! Apologies for the delayed response! I took some time to investigate the environments! I actually do need to correct my last comment (which I also confused myself about), so I actually overlooked that the initial state has to be randomly sampled during the start of each episode, so it is of the same range but distributed uniformly, therefore isn't exactly identical! I've also pushed a small fix, hope it's working as expected now (I'm looking at the scores and there's improvements, I'm unable to visualise it atm unfortunately), let me know if any further help is needed! Apologies again about the confusion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Demos #19

Running Demos #19

MukundVarmaT commented Mar 11, 2023

juliagsy commented Mar 21, 2023

MukundVarmaT commented Mar 21, 2023 •

edited

Loading

juliagsy commented Mar 21, 2023

MukundVarmaT commented Mar 21, 2023

juliagsy commented Mar 21, 2023 •

edited

Loading

MukundVarmaT commented Mar 25, 2023

juliagsy commented Apr 8, 2023 •

edited

Loading

Running Demos #19

Running Demos #19

Comments

MukundVarmaT commented Mar 11, 2023

juliagsy commented Mar 21, 2023

MukundVarmaT commented Mar 21, 2023 • edited Loading

juliagsy commented Mar 21, 2023

MukundVarmaT commented Mar 21, 2023

juliagsy commented Mar 21, 2023 • edited Loading

MukundVarmaT commented Mar 25, 2023

juliagsy commented Apr 8, 2023 • edited Loading

MukundVarmaT commented Mar 21, 2023 •

edited

Loading

juliagsy commented Mar 21, 2023 •

edited

Loading

juliagsy commented Apr 8, 2023 •

edited

Loading