-
Notifications
You must be signed in to change notification settings - Fork 19
Running MapReduce Jobs
The example of video event detection shows how we can use MapReduce (using mrjob to run Hadoop jobs) on a cluster of nodes. This work was done in collaboration with Penporn Koanantakool from UC Berkeley.
- Install Hadoop on your cluster.
- Install PyCASP.
- Install the mrjob package. For more detailed instructions see steps 15 and 16 here.
See the video event detection app example:
-
First, we create a
mapper()
function, in our case it takes a video file name and calls the diarizer code on the file (using thecluster.py
code for speaker diarization). See the code here. -
Then we create a
main()
function to setup the mrjob parameters and call the mapper function. See code example.
That's it! Once the environment is setup, you should be able to run the main function on the cluster. Each node will the execute the mapper()
function, in our case the diarize()
function in cluster.py.
In our video event detection system, we need to create a config file for each diarizer job, here is the code for that pre-processing step.