Running MapReduce Jobs

The example of video event detection shows how we can use MapReduce (using mrjob to run Hadoop jobs) on a cluster of nodes. This work was done in collaboration with Penporn Koanantakool from UC Berkeley.

Installation

Install Hadoop on your cluster.
Install PyCASP.
Install the mrjob package. For more detailed instructions see steps 15 and 16 here.

Running MapReduce with PyCASP

See the video event detection app example:

First, we create a mapper() function, in our case it takes a video file name and calls the diarizer code on the file (using the cluster.py code for speaker diarization). See the code here.
Then we create a main() function to setup the mrjob parameters and call the mapper function. See code example.

That's it! Once the environment is setup, you should be able to run the main function on the cluster. Each node will the execute the mapper() function, in our case the diarize() function in cluster.py.

In our video event detection system, we need to create a config file for each diarizer job, here is the code for that pre-processing step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running MapReduce Jobs

Installation

Running MapReduce with PyCASP

Clone this wiki locally