Sparkling - A Clojure API for Apache Spark

Sparkling is a Clojure API for Apache Spark.

Show me a small sample

(do
  (require '[sparkling.conf :as conf])
  (require '[sparkling.core :as spark])
  (spark/with-context sc (-> (conf/spark-conf)              ; this creates a spark context from the given context
                             (conf/app-name "sparkling-test")
                             (conf/master "local"))
                      (let [lines-rdd (spark/into-rdd sc ["This is a firest line"   ;; here we provide data from a clojure collection.
                                                          "Testing spark"           ;; You could also read from a text file, or avro file.
                                                          "and sparkling"           ;; You could even approach a JDBC datasource
                                                          "Happy hacking!"])]
                        (spark/collect                      ;; get every element from the filtered RDD
                          (spark/filter                     ;; filter elements in the given RDD (lines-rdd)
                            #(.contains % "spark")          ;; a pure clojure function as filter predicate
                            lines-rdd)))))

Where to find more info

Check out our site for information about Gorillalabs Sparkling and a getting started guide.

Sample Project repo available

Just clone our getting-started repo and get going right now.

But note: There's one thing you need to be aware of: Certain namespaces need to be AOT-compiled, e.g. because the classes are referenced in the startup process by name. I'm doing this in my project.clj using the :aotdirective like this

            :aot [#".*" sparkling.serialization sparkling.destructuring]

Availabilty from Clojars

Sparkling is available from Clojars. To use with Leiningen, add

to your dependencies.

Release Notes

1.2.3 - more developer friendly

added @/deref support for broadcasts Making it easier to work with broadcasts by using Clojure mechanisms. This is especially true for unit tests, as you could test without actual broadcasts, but with anything deref-able.
added RDD autonaming from fn metadata, eases navigation in SparkUI
added lookup functionality. Make sure the key to your Tuples is Serializable (Java serialization), as it will be serialized as part of your task definition, not only as part of your data. These are handled differently in Spark.

1.2.2 - added `whole-text-files` in sparkling.core.

(thanks to Jase Bell)

1.2.1 - improved Kryo Registration, AVRO reader + new Accumulator feature

feature: added accumulators (Thanks to Oleg Smirnov for that)
change: overhaul of Kryo Registration: Deprecated defregistrator macro, added Registrator type (see sparkling.serialization), with basic support of required types. This introduced a breaking change (sorry!): You need to aot-compile (or require) sparkling.serialization to run stuff in the REPL.
feature: added support for your own avro readers, making it possible to read types/records instead of maps. Major improvement on memory consumption.

1.1.1 - cleaned dependencies

No more spilling of unwanted stuff in your application. You only need to refer to sparkling to get a proper environment with Spark 1.2.1. In order to deploy to a cluster with Spark pre-installed, you need to set Spark dependency to provided in your project, though.

1.1.0 - Added a more clojuresque API

Use sparkling.core instead of sparkling.api for parameter orders similar to Clojure. Easier currying using partial.
Made it possible to use Keywords as Functions by serializing IFn instead of AFunction.
Tested with Spark 1.1.0 and Spark 1.2.1.

1.0.0 - Added value to the existing libraries (clj-spark and flambo)

It's about twice as fast by getting rid of a reflection call (thanks to David Jacot for his take on this).
Get rid of mapping/remapping inside the api functions, which
- bloated the execution plan (mine shrinked to a third) and
- (more importantly) allowed me to keep partitioner information.
adding more -values functions (e.g. map-values), againt to keep partitioner information.
Additional Sources for RDDs:
- JdbcRDD: Reading Data from your JDBC source.
- Hadoop-Avro-Reader: Reading AVRO Files from HDFS

Contributing

Feel free to fork the Sparkling repository, improve stuff and open up a pull request against our "develop" branch. However, we'll only add features with tests, so make sure everything is green ;)

Acknowledgements

Thanks to The Climate Corporation and their open source clj-spark project, and to Yieldbot for yieldbot/flambo which served as the starting point for this project.

License

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Name		Name	Last commit message	Last commit date
Latest commit History 520 Commits
_layouts		_layouts
_plugins		_plugins
articles		articles
assets		assets
data/avro		data/avro
dev-resources		dev-resources
src		src
test		test
tmp		tmp
.env		.env
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
index.html		index.html
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkling - A Clojure API for Apache Spark

Show me a small sample

Where to find more info

Sample Project repo available

Availabilty from Clojars

Release Notes

1.2.3 - more developer friendly

1.2.2 - added `whole-text-files` in sparkling.core.

1.2.1 - improved Kryo Registration, AVRO reader + new Accumulator feature

1.1.1 - cleaned dependencies

1.1.0 - Added a more clojuresque API

1.0.0 - Added value to the existing libraries (clj-spark and flambo)

Contributing

Acknowledgements

License

About

Releases

Packages

Languages

License

antonoal/sparkling

Folders and files

Latest commit

History

Repository files navigation

Sparkling - A Clojure API for Apache Spark

Show me a small sample

Where to find more info

Sample Project repo available

Availabilty from Clojars

Release Notes

1.2.3 - more developer friendly

1.2.2 - added whole-text-files in sparkling.core.

1.2.1 - improved Kryo Registration, AVRO reader + new Accumulator feature

1.1.1 - cleaned dependencies

1.1.0 - Added a more clojuresque API

1.0.0 - Added value to the existing libraries (clj-spark and flambo)

Contributing

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1.2.2 - added `whole-text-files` in sparkling.core.

Packages