Distributed real time computation/job/work queue using JavaScript. A JavaScript reimagining of the fabulous Apache Spark and Storm projects.
If you know underscore.js
or lodash.js
you may use JS-Spark
as a distributed version of them.
If you know Distributed-RPC systems like storm you will feel at home.
If you've ever worked with distributed work queues such as Celery, you will find JS-Spark easy to use.
There are no JS tools that can offload your processing to 1000+ CPUs. Furthermore, existing tools in other languages, such as Seti@Home, Gearman, require time, expensive setup of server, and later setting up/supervising clients machines.
We want to do better. On JS-Spark your clients need just to click on a URL, and the server side has one line installation (less than 5 min).
Hadoop is quite slow and requires maintaining a cluster - we can to do better. Imagine that there's no need to set up expensive cluster/cloud solutions. Use web browsers! Easily scale to multiple clients. Clients do not need to install anything like Java or other plugins.
Setup in a matter of minutes and you are good to go.
No need to setup expensive clusters. The setup takes 5 min and you are good to go. You can do it on one machine. Even on a Raspberry Pi.
-
Use as ML tool to process in real time huge streams of data... while all clients still browse their favorite websites
-
Use for big data analytics. Connect to Hadoop HDFS and process even terabytes of data.
-
Use to safely transfer huge amount of data to remote computers.
-
Use as CDN... Today most websites runs slower when more clients use them. But using JS-Spark you can totally reverse this trend. Build websites that run FASTER the more people use them
-
Synchronize data between multiple smartphones.. even in Africa
-
No expensive cluster setup required!
-
Free to use.
To add a distributed job queue to any node app run:
npm i --save js-spark
Look for Usage with npm.
git clone git@github.com:syzer/example-js-spark-usage.git && cd $_
npm install
git clone https://github.com/syzer/distributed-game-of-life.git && cd $_
npm install
This example shows how to use one of the Natural Language Processing tools called N-Gram in a distributed manner using JS-Spark:
If you'd like to know more about N-grams please read:
http://en.wikipedia.org/wiki/N-gram
Prerequisites: install Node.js
, then:
install grunt and bower,
sudo npm install -g bower
sudo npm install -g grunt
npm i --save js-spark
#or use:
git clone git@github.com:syzer/JS-Spark.git && cd $_
npm install
Then run:
node index &
node client
Or:
npm start
After that you may see how the clients do the heavy lifting.
var core = require('jsSpark')({workers:8});
var jsSpark = core.jsSpark;
jsSpark([20, 30, 40, 50])
// this is executed on the client
.map(function addOne(num) {
return num + 1;
})
.reduce(function sumUp(sum, num) {
return sum + num;
})
.thru(function addString(num){
return "It was a number but I will convert it to " + num;
})
.run()
.then(function(data) {
// this is executed on back on the server
console.log(data);
})
task = jsSpark([20, 30, 40, 50])
// this is executed on client side
.map(function addOne(num) {
return num + 1;
})
.reduce(function sumUp(sum, num) {
return sum + num;
})
.run();
jsSpark(_.range(10))
// https://lodash.com/docs#sortBy
.add('sortBy', function _sortBy(el) {
return Math.sin(el);
})
.map(function multiplyBy2(el) {
return el * 2;
})
.filter(function remove5and10(el) {
return el % 5 !== 0;
})
// sum of [ 2, 4, 6, 8, 12, 14, 16, 18 ] => 80
.reduce(function sumUp(arr, el) {
return arr + el;
})
.run();
If you run calculations via unknown clients is better to recalculate same tasks on different clients:
jsSpark(_.range(10))
.reduce(function sumUp(sum, num) {
return sum + num;
})
// how many times to repeat calculations
.run({times: 6})
.then(function whenClientsFinished(data) {
// may also get 2 most relevant answers
console.log('Most clients believe that:');
console.log('Total sum of numbers from 1 to 10 is:', data);
})
.catch(function whenClientsArgue(reason) {
console.log('Most clients could not agree, ', + reason.toString());
});
task3 = task
.then(function serverSideComputingOfData(data) {
var basesNumber = data + 21;
// All your 101 base are belong to us
console.log('All your ' + basesNumber + ' base are belong to us');
return basesNumber;
})
.catch(function (reason) {
console.log('Task could not compute ' + reason.toString());
});
This project involves reimplementing some nice things from the world of big data, so there are of course some nice resources you can use to dive into the topic:
- Map-Reduce revisited
- Awesome BigData - A curated list of awesome frameworks, resources and other things.
Normally you do not need to start UI server. But if you want to build an application on top on the js-spark UI server. Feel free to do so.
git clone git@github.com:syzer/JS-Spark.git && cd $_
npm install
grunt build
grunt serve
To spam more light-weight (headless) clients:
node client
-
mongoDB default connection parameters:
-
mongodb://localhost/jssparkui-dev user: 'js-spark', pass: 'js-spark1' install mongo, make sure mongod(mongo service) is running run mongo shell with command:
mongo
use jssparkui-dev
db.createUser({
user: "js-spark",
pwd: "js-spark1",
roles: [
{ role: "readWrite", db: "jssparkui-dev" }
]
})
-
old mongodb engines can use
db.addUser()
with same API -
to run without UI db code is not required!
-
on first run you need to seed the db: change option
seedDB: false
=>seedDB: true
on./private/srv/server/config/environment/development.js
npm test
- service/file -> removed for other module
- di -> separate module
- [!] bower for js-spark client
- config-> merge different config files
- [!] server/auth -> split to js-spark-ui module
- [!] server/api/jobs -> split to js-spark-ui module
- split ui
- more examples
- example with cli usage (not daemon)
- example with using thu
- [?] .add() is might be broken... maybe fix or remove