Merge pull request #207 from smacker/configure_spark_in_hash

configure spark for hash using env vars
src-d · Feb 28, 2019 · 1d29f9c · 1d29f9c
2 parents 4755486 + 4cbae63
commit 1d29f9c
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -126,13 +126,20 @@ docker exec -it some-scylla cqlsh
 ```
 
 
-### External Apache Spark cluster
+### Configuration for Apache Spark
 
-Just set url to the Spark Master though env var
+Use env variables to set memory for hash job:
+```
+export DRIVER_MEMORY=30g
+export EXECUTOR_MEMORY=60g
+```
+
+To use external claster just set url to the Spark Master though env var
 ```
 MASTER="spark://<spark-master-url>" ./hash <path>
 ```
 
+
 ### CLI arguments
 
 All three commands accept parameters for database connection and logging:

diff --git a/hash b/hash
@@ -56,7 +56,8 @@ sparkSubmit \
   --master "${MASTER:=local[*]}" \
   --name "${app_name}" \
   --jars "${deps_jar}" \
-  --conf "spark.executor.memory=4g" \
+  --conf "spark.driver.memory=${DRIVER_MEMORY:=2g}" \
+  --conf "spark.executor.memory=${EXECUTOR_MEMORY:=4g}" \
   --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
   --conf "spark.tech.sourced.engine.skip.read.errors=true" \
   --conf "spark.files.maxPartitionBytes=12582912" \