-
Notifications
You must be signed in to change notification settings - Fork 707
Getting Started
To get started with Scalding, first clone the Scalding repository on Github:
git clone https://github.com/twitter/scalding.git
Next, build the code using sbt (a standard Scala build tool). Make sure you have Scala (download here, see scalaVersion in project/Build.scala for the correct version to download), and run the following commands:
./sbt update
./sbt test # runs the tests; if you do 'sbt assembly' below, these tests, which are long, are repeated
./sbt assembly # creates a fat jar with all dependencies, which is useful when using the scald.rb script
Now you're good to go!
Scalding works with Scala 2.10 and 2.11 is recommended, though a few configuration files must be changed for this to work. In project/Build.scala, ensure that the proper scalaVersion value is set. Additionally, you'll need to ensure the proper version of specs in the same config. Change the following line
libraryDependencies += "org.scala-tools.testing" % "specs_2.10" % "1.6.9" % "test"
You can find the published versions here.
Scala's IDE support is generally not as strong as Java's, but there are several options that some people prefer. Both Eclipse and IntelliJ have plugins that support Scala syntax. To generate a project file for Scalding in Eclipse, refer to this project, and for IntelliJ files, this (note that with the latter, the 1.1 snapshot is recommended).
For a quick introduction into Scalding, design patterns, TDD and connecting with external systems refer to this book Programming MapReduce with Scalding. You can code examples presented in the book here
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding