-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Question regarding RNN-MDN #6
Comments
Hi @vipinpillai Please also refer to the RNN Description in Appendix: For the M Model, we use an LSTM recurrent neural network combined with a Mixture Density Network as the output layer. We use this network to model the probability distribution of the next z in the next time step as a Mixture of Gaussian distribution (NB: This answers @vipinpillai's 1st question). This approach is very similar to Graves’ Generating Sequences with RNNs in the Unconditional Handwriting Generation section and also the decoder-only section of SketchRNN. The only difference in the approach used is that we did not model the correlation parameter between each element of z), and instead had the MDN-RNN output a diagonal covariance matrix of a factored Gaussian distribution. (This answers part of @vipinpillai's 2nd question regarding making training stable and loss calculation tractable) Unlike the handwriting and sketch generation works, rather than using the MDN-RNN to model the pdf of the next pen stroke, we model instead the pdf of the next latent vector z. The MDN-RNNs were trained for 20 epochs on the data collected from a random policy agent. In the Car Racing task, the LSTM used 256 hidden units, while the Doom task used 512 hidden units. In both tasks, we used 5 Gaussian mixtures and did not model the correlation ρ parameter, hence z is sampled from a factored mixture of Gaussian distribution. (@vipinpillai: please keep this sampling in mind, since it helps safeguard against overfitting to a stored set of z's) When training the MDN-RNN using teacher forcing from the recorded data, we store a pre-computed set of μ and σ for each of the frames, and sample an input z ∼N(μ,σ) each time we construct a training batch, to prevent overfitting our MDN-RNN to a specific sampled z. I put some pointers to your questions referencing the article. Please let me know if you have any more questions! |
Thanks a lot @hardmaru for providing detailed pointers. One last implementation specific question is whether you have considered each dimension of latent z as a mixture of Univariate Gaussians or the entire z as a mixture of Multivariate Gaussians while training the MDN-RNN. I am asking this due to the numerical issues of computing the NLL for GMM where each Gaussian is 32 dimensional. |
Hi @vipinpillai, z is modelled as a factored Gaussian, so the correlation ρ parameter is assumed to be zero, unlike Graves (2013) that modelled in a correlation term. This decreases the model complexity, and also makes the training much more numerically stable, as you can operate in log-space and avoid having to take the logarithm of exponentials for your computational graph. |
Thanks @hardmaru for the quick response. I understand that you have not included the correlation parameter. |
No, I have not tried that. Feel free to try it out yourself if you think it is interesting. |
Hi, I'd like to jump in the discussion.
Thanks in advance. |
Hi @dariocazzani, I used sequence lengths of 1000 timesteps. These long lengths are possible since we don't need to train the VAE at the same time as the MDN-RNN, enabling us to save lots of GPU memory and learn all of the long-term dependencies. During training, tau is 1.0. |
Thanks @hardmaru for the answers. We have first an agent acting randomly to explore the environment multiple times, and record the random actions at taken and the resulting observations from the environment. With random actions and such long sequences, in Car Racing the car ends up spending most of the time in the grass. This seems like a waste, or am I missing something? |
hi @dariocazzani, Good question! I also had to think about this in the experiments. Another person who reproduced the CarRacing task used a method to encourage more diversity: Have you thought about potential approaches that might help overcome this issue? I'd be curious to know what other people come up with! Feel free to list some here. I'll let you know what I did to generate the random actions to encourage a more diverse set, and I think my approach is more elegant than simply hitting the accelerate pedal :) |
The person who wrote that blog post did a few things:
This works well if the car starts at the same spot and if we limit the rollouts at 300. What I did was first of all to make the car start at random points for each rollout: from gym.envs.box2d.car_dynamics import Car
from gym.envs.box2d import CarRacing
[...]
position = np.random.randint(len(env.track))
env.car = Car(env.world, *env.track[position][1:4]) I made a PR to OpenAI gym to make this possible without "tricks" Link to PR And this is my policy for generating random actions. def generate_action(prev_action):
if np.random.randint(3) % 3:
return prev_action
index = np.random.randn(3)
# Favor acceleration over the others:
index[1] = np.abs(index[1])
index = np.argmax(index)
mask = np.zeros(3)
mask[index] = 1
action = np.random.randn(3)
action = np.tanh(action)
action[1] = (action[1] + 1) / 2
action[2] = (action[2] + 1) / 2
return action*mask When I run the prediction I assume that the car never has to brake and accelerate at the same time. Thus I could reduce the number of actions to 2. This was beneficial for reducing by ~33% the number of parameters of the Controller [...]
action[0] = prediction[0]
if prediction[1] < 0:
action[1] = np.abs(prediction[1])
action[2] = 0
else:
action[2] = prediction[1]
action[1] = 0
return action I am documenting everything on a series of blog posts (still WIP since I need to work on it on my spare time). Suggestions for improvements are always welcome :) Thanks again for the feedback @hardmaru |
That's a nice strategy of starting at a random place on the track. However, I would be careful not to change the "official" carracing-v0 to have this setting, since it should be kept the same way for evaluation purposes against published results using this environment. Perhaps making a fork, a separate environment (for the purpose of training an agent only, not for evaluation), might make more sense? You would still need to evaluate on the original carracing-v0 to compare results with previous published methods. Reducing the action space to 2 is also clever. I did a similar thing for the DoomTakeCover scenario and reduced the action space to 1 real value action. To generate random episodes that are more diverse, rather than using a simple random policy based on sampling the action space uniformly, what I did was I initialized V, M, and C with random weights, sampled from a normal distribution with zero mean and a small standard deviation parameter. This way, the random agent would go about using its randomized policy to drive around the track in a way that would be more diverse and also in a way that represents a natural prior of what the agent can do, since in the end the agent has to learn a set of parameters of V, M, C from this sampled parameter space anyways. Good luck! |
Hey @hardmaru, I'll respond point per point.
An extra bit: Variation over using MDNs for M I would compute the KL divergence in closed form. The Monte Carlo method might not be suitable for small vectors like these ones. |
Thanks for sharing the demo. I had some questions regarding the RNN-MDN module used to sample zt+1.
Since the latent sample is 32-dimensional for the car racing experiment, does the MDN model each distribution within the mixture as a Multivariate Normal distribution?
Could you please share some details regarding the explicit separation between the RNN and the MDN modules and the necessary tweaks needed for stabilizing the training of the MDN and to keep the NLL loss calculation tractable for the multivariate MDN.
The text was updated successfully, but these errors were encountered: