-
Notifications
You must be signed in to change notification settings - Fork 385
Profiling Spark Using YourKit
This page has been moved to the Apache Spark confluence wiki: https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
Here are instructions on profiling Spark applications using YourKit Java Profiler.
-
After logging into the master node, download the YourKit Java Profiler for Linux from the YourKit downloads page (at the time of writing, the latest version is
yjp-12.0.5-linux.tar.bz2
; you will need to substitute different paths if using a newer version). This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, so you may consider mirroring this file or including it on a custom AMI. -
Untar this file somewhere (in
/root
in our case):tar xvjf yjp-12.0.5-linux.tar.bz2
-
Copy the expanded YourKit files to each node using
copy-dir
:~/spark-ec2/copy-dir /root/yjp-12.0.5
-
Configure the Spark JVMs to use the YourKit profiling agent by editing
~/spark/conf/spark-env.sh
and adding the linesSPARK_DAEMON_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_DAEMON_JAVA_OPTS SPARK_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling" export SPARK_JAVA_OPTS
-
Copy the updated configuration to each node:
~/spark-ec2/copy-dir ~/spark/conf/spark-env.sh
-
Restart your Spark cluster:
~/spark/bin/stop-all.sh ~/spark/bin/start-all.sh
-
By default, the YourKit profiler agents use ports 10001-10010. To connect the YourKit desktop application to the remote profiler agents, you'll have to open these ports in the cluster's EC2 security groups.
To do this, #to the AWS Management Console. Go to the EC2 section and select
Security Groups
from theNetwork & Security
section on the left side of the page. Find the security groups corresponding to your cluster; if you launched a cluster namedtest_cluster
, then you will want to modify the settings for thetest_cluster-slaves
andtest_cluster-master
security groups. For each group, select it from the list, click theInbound
tab, and create a newCustom TCP Rule
opening the port range10001-10010
. Finally, clickApply Rule Changes
. Make sure to do this for both security groups.Note: by default,
spark-ec2
re-uses security groups: if you stop this cluster and launch another cluster with the same name, your security group settings will be re-used. -
Launch the YourKit profiler on your desktop.
-
Select "Connect to remote application..." from the welcome screen and enter the the address of your Spark master or worker machine, e.g.
ec2-*-*-*-*.compute-1.amazonaws.com
-
YourKit should now be connected to the remote profiling agent. It may take a few moments for profiling information to appear.
Please see the full YourKit documentation for the full list of profiler agent startup options.