This repository has been archived by the owner on Nov 16, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 355
GetStarted_EC2
Andy Feng edited this page Feb 19, 2016
·
15 revisions
- Set up your EC2 key pair, and apply spark-ec2 from Apache Spark as below to launch a Spark cluster with 2 slaves on g2.2xlarge (1 GPU, 8 vCPUs) or g2.8xlarge (4 GPUs, 32 vCPUs) instances with an CaffeOnImage AMI. You could check your request status at EC2 console, and current spot price at https://aws.amazon.com/ec2/spot/#/.
Region: Europe | Region: Asia Pacific |
---|---|
export AMI_IMAGE=ami-d8dc61ab |
export AMI_IMAGE=ami-2eb1784d |
export EC2_REGION=eu-west-1 |
export EC2_REGION=ap-southeast-1 |
export EC2_ZONE=eu-west-1c |
export EC2_ZONE=ap-southeast-1b |
export SPARK_WORKER_INSTANCES=2
export EC2_INSTANCE_TYPE=g2.2xlarge
#export EC2_INSTANCE_TYPE=g2.8xlarge
export EC2_MAX_PRICE=0.8
${SPARK_HOME}/ec2/spark-ec2 --key-pair=${EC2_KEY} --identity-file=${EC2_PEM_FILE} \
--region=${EC2_REGION} --zone=${EC2_ZONE} \
--ebs-vol-size=50 \
--instance-type=${EC2_INSTANCE_TYPE} \
--master-instance-type=m4.xlarge \
--ami=${AMI_IMAGE} -s ${SPARK_WORKER_INSTANCES} \
--spot-price ${EC2_MAX_PRICE} \
--copy-aws-credentials \
--hadoop-major-version=yarn --spark-version 1.6.0 \
--no-ganglia \
--user-data ${CAFFE_ON_SPARK}/scripts/ec2-cloud-config.txt \
launch CaffeOnSparkDemo
You should see the following line, which contains the host name of your Spark master.
Spark standalone cluster started at http://ec2-52-49-81-151.eu-west-1.compute.amazonaws.com:8080
Done!
- ssh onto Spark master
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${EC2_PEM_FILE} root@<SPARK_MASTER_HOST>
- Train a DNN model, and test using mnist dataset located at ${CAFFE_ON_SPARK}/data
g2.4xlarge | g2.8xlarge |
---|---|
export CORES_PER_WORKER=8 |
export CORES_PER_WORKER=32 |
export DEVICES=1 |
export DEVICES=4 |
export SPARK_WORKER_INSTANCES=2
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES}))
source ~/.bashrc
export PATH=${PATH}:${HADOOP_HOME}/bin:${SPARK_HOME}/bin
pushd ${CAFFE_ON_SPARK}/data
hadoop fs -rm -r -f /mnist_lenet.model
hadoop fs -rm -r -f /lenet_features_result
spark-submit --master spark://$(hostname):7077 \
--files lenet_memory_train_test.prototxt,lenet_memory_solver.prototxt \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \
--class com.yahoo.ml.caffe.CaffeOnSpark \
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \
-train \
-features accuracy,loss -label label \
-conf lenet_memory_solver.prototxt \
-clusterSize ${SPARK_WORKER_INSTANCES} \
-devices ${DEVICES} \
-connection ethernet \
-model /mnist_lenet.model \
-output /lenet_features_result
hadoop fs -ls /mnist_lenet*
hadoop fs -cat /lenet_features_result/*
- Destroy EC2 clusters
${SPARK_HOME}/ec2/spark-ec2 --key-pair=${EC2_KEY} --identity-file=${EC2_PEM_FILE} \
--region=${EC2_REGION} --zone=${EC2_ZONE} \
destroy CaffeOnSparkDemo