Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

LSTM training is super slow on GPU #34

Open
phgilde opened this issue Aug 6, 2020 · 7 comments
Open

LSTM training is super slow on GPU #34

phgilde opened this issue Aug 6, 2020 · 7 comments

Comments

@phgilde
Copy link

phgilde commented Aug 6, 2020

This training loop takes more than a second per epoch using tensorflow-directml but a fraction of a second with standard tensorflow.
It actually doesnt work at all (error is NaN after a couple of iterations) but I already opened another Issue for that.

Code:

import tensorflow as tf
import numpy as np
from tensorflow import keras
import matplotlib.pyplot as plt
import time
from datetime import timedelta

def fn(x):
    return tf.sin(x)

seq_length = 200
x = tf.linspace(tf.constant(0, dtype=tf.float32), 50, seq_length)
y = fn(x)

n_outputs = 50
model = keras.layers.LSTM(n_outputs, return_sequences=True)
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = keras.losses.MSE

loss_history = []
epochs = 2_000
out_epochs = 10
start = time.time()
for epoch in range(epochs):
    with tf.GradientTape() as tape:
        y_pred = model(tf.zeros(shape=(1, seq_length, 1)))
        y_pred_data = y_pred[0, :, 0]
        loss = loss_fn(y, y_pred_data)
    loss_history.append(loss.numpy())
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    if epoch % out_epochs == 0:
        print(f"Epoch {epoch}: Loss = {loss} ({timedelta(seconds=time.time()-start)})")

System: Intel i5-7200U with Intel HD graphics 620

@PatriceVignola
Copy link
Contributor

Thank you for reporting this @phgilde . Are you running this script on Windows or WSL?

@phgilde
Copy link
Author

phgilde commented Aug 7, 2020

@PatriceVignola I'm running this on windows

@jstoecker jstoecker transferred this issue from microsoft/DirectML Sep 17, 2020
@jstoecker
Copy link
Contributor

We've implemented the single-step/block-based LSTM/GRU/RNN ops, but these are really better suited to CPU architectures. Models typically use the multi-step cuDNN ops when executing on a GPU device. It's not unsurprising that there's some more work here to make DML perform better with recurrent networks.

@wchao1115
Copy link

@phgilde What GPU you're running this with? You mentioned standard tensorflow and that your config is with Intel HD graphics. Is this training script running on CPU?

@ghostlypi
Copy link

I've had the same issue on an RX 560. In task manager neither the GPU nor the CPU seems to take on any load.
image

@onurberkay
Copy link

I have same problem with 4750u amd apu , also gpu load not even %1-2

@PatriceVignola
Copy link
Contributor

@onurberkay What does tf.config.list_physical_devices() give you?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants