-
Notifications
You must be signed in to change notification settings - Fork 769
Getting Started
Note: this is for HiBench 5.0
-
System setup.
(1) Setup JDK, Hadoop-YARN, Spark runtime environment properly.
(2) For HiBench V4.0 and later, python 2.x(>=2.6) is required.
(3) Download/checkout HiBench benchmark suite
(4) Build HiBench with Maven. Please specify Spark version and Map Reduce version. For example, for Spark 1.5 and MR2, run
cd src mvn clean package -D spark1.5 -D MR2
Optionally you can run
<HiBench_Root>/bin/build-all.sh
to build HiBench for all known Spark and MR versions. -
HiBench Configurations.
For minimum requirements: create & edit
conf/99-user_defined_properties.conf
:cd conf cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
And Make sure below properties has been set:
hibench.hadoop.home The Hadoop installation location hibench.spark.home The Spark installation location hibench.hdfs.master HDFS master hibench.spark.master SPARK master
Note: For YARN mode, set
hibench.spark.master
toyarn-client
. (yarn-cluster
is not supported yet)If the spark and hadoop version is not auto probed correctly, please set
hibench.hadoop.executable
,hibench.hadoop.version
andhibench.spark.version
in 99-user_defined_properties.conf.To run HiBench on HDP, please specify
hibench.hadoop.mapreduce.home
to the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specifyhibench.hadoop.release
to "hdp". -
Run. For example, to run a single workload
wordcount
on Spark.workloads/wordcount/prepare/prepare.sh workloads/wordcount/spark/scala/bin/run.sh
You can also try
<HiBench_Root>/bin/run-all.sh
to run all workloads. Note: The same configuration may not work for all workloads. -
View the report:
Goto
<HiBench_Root>/report
to check for the final report:-
report/hibench.report
: Overall report about all workloads. -
report/<workload>/<language APIs>/bench.log
: Raw logs on client side. -
report/<workload>/<language APIs>/monitor.html
: System utilization monitor results. -
report/<workload>/<language APIs>/conf/<workload>.conf
: Generated environment variable configurations for this workload. -
report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf
: Generated configuration for this workloads, which is used for mapping to environment variable. -
report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf
: Generated configuration for spark.
[Optional] Execute
<HiBench root>/bin/report_gen_plot.py report/hibench.report
to generate report figures.Note:
report_gen_plot.py
requirespython2.x
andpython-matplotlib
. -