This repository has been archived by the owner on Jan 20, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 265
Home
Sam Ritchie edited this page Aug 26, 2013
·
29 revisions
Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations. So, while a word-counting aggregation in pure Scala might look like this:
def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
source.flatMap { sentence =>
toWords(sentence).map(_ -> 1L)
}.foreach { case (k, v) => store.update(k, store.get(k) + v) }
Counting words in Summingbird looks like this:
def wordCount[P <: Platform[P]]
(source: Producer[P, String], store: P#Store[String, Long]) =
source.flatMap { sentence =>
toWords(sentence).map(_ -> 1L)
}.sumByKey(store)
The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in “batch mode” (using Scalding), in “realtime mode” (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.
Summingbird provides you with the primitives you need to build rock solid production systems.
- LambdaJam 2013 Summingbird Workshop
- Summingbird: StreamingMapReduce at Twitter (Sam’s talk from the AK Data Science Summit)