A SageMaker XGBoost Container that is used for Inference Endpoint
aws ecr get-login-password --profile dev --region us-east-1 | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com
docker build -t xgboost-container-base:0.90-2-cpu-py3 -f docker/0.90-2/base/Dockerfile.cpu .
python setup.py bdist_wheel --universal
docker build -q -t multi-model-xgboost -f docker/0.90-2/final/Dockerfile.cpu .
docker tag multi-model-xgboost "{ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/multi-model-xgboost:latest"
docker push {ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/multi-model-xgboost:latest
Currently, SageMaker Endpoint does not automatically update upon the new push of ECR image. As a short term solution, we create a new configuration file that points to the latest pushed ECR image, and make the endpoint to point to it to trigger the update.
- Go to AWS SageMaker Console -> Endpoint configurations
- 'Clone' the latest Endpoint configuration with a prefix "multi-model-xgboost-config-*"
- Set the 'Endpoint configuration name' to "multi-model-xgboost-config-copy-{MM}-{DD}" (Most of it will already be generated)
- From 'Production variants', check:
- if the model is pointing to the correct ECR image
- other configurations are correct
- 'Create endpoint configuration'
- Go to AWS SageMaker Console -> Endpoints
- Click on the
multi-model-xgboost
and 'Update endpoint' - 'Change Endpoint configuration' -> 'Use an existing endpoint configuration' -> Choose the configuration created from Step 5 -> 'Select endpoint configuration' -> 'Update endpoint'
- Wait until the Status =
InService
- Go to AWS CloudWatch Console -> Insights -> 'Select log group(s)' ->
/aws/sagemaker/Endpoints/multi-model-xgboost
- Query following for last 5 mins
fields @timestamp, @message
| filter
@message like "CPU" or
@message like "GPU" or
@message like "workers" or
@message like {WHATEVER METRICS OF INTEREST}
| sort @timestamp desc
| limit 20
- Observe the values are shown as expected. Ex.
1
2021-02-05T10:06:17.014-08:00
Number of GPUs: 0
2
2021-02-05T10:06:17.014-08:00
Number of CPUs: 2
3
2021-02-05T10:06:17.014-08:00
Default workers per model: 16
Test script: xgen/x2mind/test/test_inference.py
- Set the
customer_id
,model_id
,user_id
with valid Customer ID, Model ID, User ID respectively. - Run the script by
python3 test_inference.py
- Go to AWS CloudWatch Console -> Insights -> 'Select log group(s)' ->
/aws/sagemaker/Endpoints/multi-model-xgboost
- Query following:
fields @timestamp, @message
| filter
@message like "Error" or
@message like {WHATEVER METRICS OF INTEREST}
| sort @timestamp desc
| limit 20
- Observe the values are shown as expected.