Skip to content

Commit

Permalink
Merge pull request #207 from smacker/configure_spark_in_hash
Browse files Browse the repository at this point in the history
configure spark for hash using env vars
  • Loading branch information
smacker authored Feb 28, 2019
2 parents 4755486 + 4cbae63 commit 1d29f9c
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 3 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,13 +126,20 @@ docker exec -it some-scylla cqlsh
```


### External Apache Spark cluster
### Configuration for Apache Spark

Just set url to the Spark Master though env var
Use env variables to set memory for hash job:
```
export DRIVER_MEMORY=30g
export EXECUTOR_MEMORY=60g
```

To use external claster just set url to the Spark Master though env var
```
MASTER="spark://<spark-master-url>" ./hash <path>
```


### CLI arguments

All three commands accept parameters for database connection and logging:
Expand Down
3 changes: 2 additions & 1 deletion hash
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ sparkSubmit \
--master "${MASTER:=local[*]}" \
--name "${app_name}" \
--jars "${deps_jar}" \
--conf "spark.executor.memory=4g" \
--conf "spark.driver.memory=${DRIVER_MEMORY:=2g}" \
--conf "spark.executor.memory=${EXECUTOR_MEMORY:=4g}" \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.tech.sourced.engine.skip.read.errors=true" \
--conf "spark.files.maxPartitionBytes=12582912" \
Expand Down

0 comments on commit 1d29f9c

Please # to comment.