Running Apache Storm benchmark

Overview

Before you begin, make sure you compiled the application and created dataset: [Create dataset for Apache Storm benchmark](Create dataset for Apache Storm benchmark)

The Apache Storm benchmark contains the following topologies:

EnronTopology: Complete application benchmark
TrivialTopology1: Same as EnronTopology but filter, modify, and metrics are unity bolts
TrivialTopology2: Same as TrivialTopology1 but filter and modify bolts are removed
TrivialTopology3: Same as TrivialTopology2 but serialization and deserialization bolts are removed This topology requires an unserialized but compressed dataset (create using com.ibm.storm.email.benchmark.testing.CreateCompressedDatasetSequential)
TrivialTopology4: Same as TrivialTopology3 but without compression and decompression This topology requires an uncompressed dataset (create using com.ibm.storm.email.benchmark.testing.CreateSerializedDatasetSequential)

To Run the application benchmark

storm jar target/storm-email-benchmark-1.0-jar-with-dependencies.jar com.ibm.storm.email.benchmark.<topology_name> <local_or_remote> <job_id>

These topologies make use of vanilla shuffle grouping. If you want to use localOrShuffle group instead, use com.ibm.storm.email.benchmark.local.<topology_name>.

For some setups, especially single process ones, shuffle seems to perform better than localOrShuffle.

Final Metrics Configuration

The final metrics are emitted by the Global Metrics Bolt
To this end, it needs to know the total number of emails. It gets this number from the configuration file (totalemails)
This number needs to be updated each time the dataset changes. Just uncomment the totalemails for the corresponding dataset in the configuration file

Results Collection

Final number of characters, words, and paragraphs, throughput, elapsed time, and number of processed emails can be retrieved from <logspath>/<job_id>/GlobalMetricsBolt_Final

See Configuration section above for details of "logspath"

Interval metrics can be obtained from <logspath>/<job_id>/GlobalMetricsBolt and <logspath>/<job_id>/GlobalMetricsBolt_Throughput
To collect CPU Time after the job has completed: a. jps Note down the PIDs of all Worker processes a. For each Worker PID ps -e -o pid,cputime | grep <pid>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Apache Storm benchmark

Overview

To Run the application benchmark

Final Metrics Configuration

Results Collection

Clone this wiki locally