mastering-apache-spark-book/spark-deeplearning.adoc at master · hal2069/mastering-apache-spark-book · GitHub

Distributed Deep Learning on Spark (using Yahoo’s Caffe-on-Spark)

Read the article Large Scale Distributed Deep Learning on Hadoop Clusters to learn about Distributed Deep Learning using Caffe-on-Spark:

To enable deep learning on these enhanced Hadoop clusters, we developed a comprehensive distributed solution based upon open source software libraries, Apache Spark and Caffe. One can now submit deep learning jobs onto a (Hadoop YARN) cluster of GPU nodes (using spark-submit).

Caffe-on-Spark is a result of Yahoo’s early steps in bringing Apache Hadoop ecosystem and deep learning together on the same heterogeneous (GPU+CPU) cluster that may be open sourced depending on interest from the community.

In the comments to the article, some people announced their plans of using it with AWS GPU cluster.