This example demonstrates how you can debug a TensorFlow Training script running inside a prebuilt SageMaker Docker image for TensorFlow.
Debugging a running Docker container in PyCharm is possible using Remote debugging with the Python remote debug server configuration. For more details, please refer to the PyCharm documentation.
Steps required to debug a TensorFlow Training script running inside a prebuilt SageMaker Docker image for TensorFlow
Please follow the steps described below
From the main menu, choose Run| Edit Configuration.... The Run/debug configurations dialog opens. You have to click Add configuration on the toolbar, and from the list of available configurations, select Python Debug Server.
Enter the name of this run/debug configuration - let it be tensorflow_script_mode_debug_local_training. Specify the port number (here 4200
) and the IDE host name (here host.docker.internal
) which resolves to the internal IP address used by the host. These parameters will be used by the remote TensorFlow training script.
The mnist_tf2.py
script will call pydevd-pycharm
dependency, therefore, you can see a requirements.txt
in the same folder, containing this dependency.
Using source_dir
parameter in the TensorFlow
Estimator, SageMaker SDK will detect a requirements.txt
, and will install pydevd-pycharm
on the container before calling the mnist_tf2.py
script.
Add the following commands in mnist_tf2.py
script:
import pydevd_pycharm
pydevd_pycharm.settrace('host.docker.internal', port=4200, stdoutToServer=True, stderrToServer=True)
This will enable the TensorFlow script running within the Docker container to connect to the Debug Server.
Choose the created run/debug configuration, and click
Next, You'll see a Waiting for process connection...
message
Browse to the tensorflow_script_mode_debug_local_training.py
file and run it.
Look at the output of the Run, you'll be able to see the following lines, indicating installation of pydevd-pycharm
You will see the following message, indicating that the code int he container can't be mapped to the code in your project.
click on Auto-detect
, choose the mnist_tf2.py
file in your project path, and click OK
Now you should see the mnist_tf2.py
code
Now you are almost ready to debug.
Set a breakpoint on the following line:
return parser.parse_known_args()
In this way, you'll be able to debug the arguments passed by SageMaker SDK.
In order to resume the program and stop on the breakpoint you set earlier, navigate to the Debug tool window, and choose Resume Program
Your code is actually executed on the Docker container, but debugged on the local machine.
Here you can see a breakpoint before actually running model.fit
command