Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pre train LSTM policy [question] #253

Closed
XMaster96 opened this issue Mar 29, 2019 · 5 comments · May be fixed by #315
Closed

pre train LSTM policy [question] #253

XMaster96 opened this issue Mar 29, 2019 · 5 comments · May be fixed by #315
Labels
enhancement New feature or request question Further information is requested

Comments

@XMaster96
Copy link

I want to pre train a LSTM policy, with some Example data. My current approach, is to train it like a normal feed forward network (plugging in the observants in one end and compare the other wit my ground truth), and hope that your LSTM Implementation is doing the rest (hidden state managment) for me. But before I find out that it is not so easy and I spend the next two weeks of my life code digging and hidden state managing, I thought I could just simply ask you guys. Is there anything i need to keep in mind when I train the LSTM policy directly?

@XMaster96 XMaster96 changed the title pre train LSTM policy [question] [question] pre train LSTM policy Mar 29, 2019
@XMaster96 XMaster96 changed the title [question] pre train LSTM policy pre train LSTM policy [question] Mar 29, 2019
@araffin araffin added the question Further information is requested label Mar 29, 2019
@araffin
Copy link
Collaborator

araffin commented Mar 29, 2019

Hello,

Good question, I didn't try to pretrain lstm yet... and currently there is not test for that, so I cannot assure you it will work out of the box. But i'm interested by your results ;)

Is there anything i need to keep in mind when I train the LSTM policy directly?

Pre-training only means supervised learning, so if you are training on a sequence of (observation, action) you should keep track of the hidden state between each step. I don't have other advice more than that for now.

Btw, you should be aware that we are currently working on refactoring the way recurrent policies are define, see PR #244.

@XMaster96
Copy link
Author

XMaster96 commented Mar 30, 2019

Thanks for the answer
I have looked into a bit more and I have some more questions.

  1. So there are two special placeholders I need to feed data in when using a LSTM policie. The initials hidden state tensor, and a mask tensor.
    The states have the shape (envs_per_batch, n_hidden * 2), and masks is a boolean list with the shape (envs_per_batch * self.n_steps) and it restest the hidden state when ever is True. What I don't understand is how the one shape is translated in to the other.

  2. In the Docks there are some pre train examples using a pretrained function, but none of the learners seem to have this function. so would it remove?
    EDIT:
    Stupid me, forgot to check the version I am currently in.

@araffin
Copy link
Collaborator

araffin commented Mar 31, 2019

I don't understand is how the one shape is translated in to the other.

The LSTM code is quite misleading, I think you can have some hints by reading @erniejunior issue: #158

Side note: state_shape = [n_lstm * 2] dim because of the cell and hidden states of the LSTM

In the Docks there are some pre train examples using a pretrained function, but none of the learners seem to have this function. so would it remove?

Yes, the online doc correspond to the "master" version. Pre-training was added in v2.5.0 that was released last week.

@araffin
Copy link
Collaborator

araffin commented Apr 8, 2019

@XMaster96 Did you make pretrain() work with LSTM policies? Or did you had to tweak the code? (Here I'm not talking about recording expert data, which is your current PR ;))

@XMaster96
Copy link
Author

XMaster96 commented Apr 12, 2019

@araffin sorry haven't seen your reply apparently i had the Illusion that I would get a notification.

Yes I am still working on it, the problem is just that the original solution is quite hack and not something you put in a PR. unfortunately I am also stretched a bit for time at the moment, but I shud have a early PR done in the next couple of days.

Edit:
I had some wrong notifications settings, now I should get notified

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants