-
Notifications
You must be signed in to change notification settings - Fork 771
Advanced Configurations
Note: this is for HiBench 5.0
-
Parallelism, memory, executor number tuning:
hibench.default.map.parallelism Mapper numbers in MR, partition numbers in Spark hibench.default.shuffle.parallelism Reducer numbers in MR, shuffle partition numbers in Spark hibench.yarn.executors.num Number executors in YARN mode hibench.yarn.executors.cores Number executor cores in YARN mode spark.executors.memory Executor memory, standalone or YARN mode spark.driver.memory Driver memory, standalone or YARN mode
Note: All
spark.*
properties will be passed to Spark runtime configuration. -
Compress options:
hibench.compress.profile Compression option `enable` or `disable` hibench.compress.codec.profile Compression codec, `snappy`, `lzo` or `default`
-
Data scale profile selection:
hibench.scale.profile Data scale profile, `tiny`, `small`, `large`, `huge`, `gigantic`, `bigdata`
You can add more data scale profiles in
conf/10-data-scale-profile.conf
. And please don't changeconf/00-default-properties.conf
if you have no confidence. -
Configure for each workload or each language API:
-
All configurations will be loaded in a nested folder structure:
conf/*.conf Configure globally workloads/<workload>/conf/*.conf Configure for each workload workloads/<workload>/<language APIs>/.../*.conf Configure for various languages
-
For configurations in same folder, the loading sequence will be sorted according to configure file name.
-
Values in latter configure will override former.
-
The final values for all properties will be stored in a single config file located at
report/<workload><language APIs>/conf/<workload>.conf
, which contain all values and pinpoint the source of the configures.
-
-
Configure for future Spark release
By default,
bin/build-all.sh
will build HiBench for all running environments:- MR1, Spark1.2 - MR1, Spark1.3 - MR2, Spark1.2 - MR2, Spark1.3 - MR2, Spark1.4 - MR2, Spark1.4
And HiBench will probe Hadoop & Spark release version and choose proper HiBench release automatically. However, for furture Spark release (for example, Spark1.4) which is API compatibled with Spark1.3. HiBench'll fail due to lack the profile. You can define Hadoop/Spark release version by setting to force HiBench using Spark1.3 profile:
hibench.spark.version spark1.3
-
Configures for running workloads and language APIs:
The conf/benchmarks.lst
file under the package folder defines the
workloads to run when you execute the bin/run-all.sh
script under
the package folder. Each line in the list file specifies one
workload. You can use #
at the beginning of each line to skip the
corresponding bench if necessary.
You can also run each workload separately. In general, there are 3 different files under one workload folder.
prepare/prepare.sh Generate input data in HDFS for
running the benchmark
mapreduce/bin/run.sh run MapReduce language API
spark/java/bin/run.sh run Spark/java language API
spark/scala/bin/run.sh run Spark/scala language API
spark/python/bin/run.sh run Spark/python language API