-
Notifications
You must be signed in to change notification settings - Fork 0
Running Apache Storm benchmark
Before you begin, make sure you compiled the application and created dataset: [Create dataset for Apache Storm benchmark](Create dataset for Apache Storm benchmark)
The Apache Storm benchmark contains the following topologies:
- EnronTopology: Complete application benchmark
- TrivialTopology1: Same as EnronTopology but filter, modify, and metrics are unity bolts
- TrivialTopology2: Same as TrivialTopology1 but filter and modify bolts are removed
- TrivialTopology3: Same as TrivialTopology2 but serialization and deserialization bolts are removed This topology requires an unserialized but compressed dataset (create using com.ibm.storm.email.benchmark.testing.CreateCompressedDatasetSequential)
- TrivialTopology4: Same as TrivialTopology3 but without compression and decompression This topology requires an uncompressed dataset (create using com.ibm.storm.email.benchmark.testing.CreateSerializedDatasetSequential)
storm jar target/storm-email-benchmark-1.0-jar-with-dependencies.jar com.ibm.storm.email.benchmark.<topology_name> <local_or_remote> <job_id>
These topologies make use of vanilla shuffle grouping. If you want to use localOrShuffle group instead, use com.ibm.storm.email.benchmark.local.<topology_name>.
For some setups, especially single process ones, shuffle seems to perform better than localOrShuffle.
- The final metrics are emitted by the Global Metrics Bolt
- To this end, it needs to know the total number of emails. It gets this number from the configuration file (totalemails)
- This number needs to be updated each time the dataset changes. Just uncomment the totalemails for the corresponding dataset in the configuration file
Final number of characters, words, and paragraphs, throughput, elapsed time, and number of processed emails can be retrieved from <logspath>/<job_id>/GlobalMetricsBolt_Final
See Configuration section above for details of "logspath"
-
Interval metrics can be obtained from
<logspath>/<job_id>/GlobalMetricsBolt
and<logspath>/<job_id>/GlobalMetricsBolt_Throughput
-
To collect CPU Time after the job has completed: a.
jps
Note down the PIDs of all Worker processes a. For each Worker PIDps -e -o pid,cputime | grep <pid>