Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

RNN example #246

Closed
mratsim opened this issue Jul 2, 2019 · 1 comment
Closed

RNN example #246

mratsim opened this issue Jul 2, 2019 · 1 comment

Comments

@mratsim
Copy link
Contributor

mratsim commented Jul 2, 2019

I would be very interested in a RNN implementation example using Tiramisu.

Unfortunately due to #217 I cannot explore that myself at the moment.

Also in the tiramisu.github.io website, you claim that Halide cannot represent RNNs but time is just another loop/tensor dimension. Case in point, this seems to be an actual implementation of LSTM in Halide: https://github.com/misaka-10032/Halstm/blob/master/src/layer/lstm.cpp.

One thing I would be very interested in is the wavefront optimisation on stacked RNNs as mentionned in Nvidia's blog post in optimization 3.

They even provide the Cuda source code that can serve as a reference benchmark.

@rbaghdadi
Copy link
Collaborator

Hi @mratsim ,

Here is an example of LSTM implemented in Tiramisu with the GPU schedule: https://github.com/Tiramisu-Compiler/tiramisu/tree/master/benchmarks/DNN/blocks/LSTM/gpu

This code does iteration space skewing and implements most of the optimizations mentioned on that blog post (we used that as a reference).

You can also find GEMM from cuBLAS implementation in https://github.com/Tiramisu-Compiler/tiramisu/tree/master/benchmarks/linear_algebra/blas/level3/sgemm/gpu in case you are interested.

In general, DNN and BLAS benchmarks are implementd in https://github.com/Tiramisu-Compiler/tiramisu/tree/master/benchmarks

In the case of Halide, I remember having a discussion with Halide people, and if I remember, Halide can support LSTMs if you know the number of steps you want to unroll the LSTM (i.e., extent of your time loop is known at compile time), but the trick of using a time loop will not work if you don't know the size at compile time.

@mratsim mratsim closed this as completed Jul 3, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants