Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
tiru1930 opened this issue Sep 16, 2020 · 10 comments
Closed

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

tiru1930 opened this issue Sep 16, 2020 · 10 comments

Comments

@tiru1930
Copy link

I have created deployed xgboost framework service, i was able to hit the endpoint with sagemaker runtime session, Now i have updated the xgboost endpoint to capture the i/o data, and created base line and Scheduled monitor job. After that i have invoked the service but monitoring execution summaries is empty, an i have not seen any s3 bucket also.

To reproduce
1.Create xgboost framework deployment.
2.updated endpoint with data capture logs
3.Create base line
4.schedule monitor job
5.invoke the endpoint

Expected behavior
I should get monitoring results

Screenshots or logs

{'MonitoringScheduleSummaries': [{'CreationTime': datetime.datetime(2020, 9, 16, 11, 41, 56, 284000, tzinfo=tzlocal()),
                                  'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
                                  'LastModifiedTime': datetime.datetime(2020, 9, 16, 11, 41, 59, 295000, tzinfo=tzlocal()),
                                  'MonitoringScheduleArn': 'arn:aws:sagemaker:us-west-2:990360540682:monitoring-schedule/xgboost-dest-prediction-model-monitor-2020-09-16-11-41-56',
                                  'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-11-41-56',
                                  'MonitoringScheduleStatus': 'Scheduled'}],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '445',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 16 Sep 2020 12:18:44 GMT',
                                      'x-amzn-requestid': 'a2f5fcb4-a12c-481d-88a4-ff9dbeb5bbd2'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a2f5fcb4-a12c-481d-88a4-ff9dbeb5bbd2',
                      'RetryAttempts': 0}}

results

{'MonitoringExecutionSummaries': [],
 'ResponseMetadata': {'RequestId': '43e86dab-c257-49ce-9f1b-09d2f6ea4463',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '43e86dab-c257-49ce-9f1b-09d2f6ea4463',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '35',
   'date': 'Wed, 16 Sep 2020 12:19:23 GMT'},
  'RetryAttempts': 0}}

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.6
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): xgboost frame work
  • Framework version: 0.90-2
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N
@tiru1930
Copy link
Author

now i can see cloud watch logs on hourly interval

but i got below error

Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment. 2020-09-17 08:12:36 ERROR Main:80 - Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.

@icywang86rui
Copy link
Contributor

@tiru1930 could you provide the code you used to produce this error here?

@tiru1930
Copy link
Author

@icywang86rui Here is code

1. Model Training 

from sagemaker import image_uris

region = sess.region_name
container = image_uris.retrieve('xgboost',region,version='latest')
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.TrainingInput(s3_data='s3://{}/{}/train/'.format(bucket, prefix), content_type='csv')

sm_sess = sagemaker.session.Session()

create_date = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
destination_prediction_experiment = Experiment.create(experiment_name="destination-lat-lon-prediction-xgboost-{}".format(create_date), 
                                              description="Using xgboost to predict end location lat and lon", 
                                              sagemaker_boto_client=boto3.client('sagemaker'))

from sagemaker.xgboost import XGBoost

hyperparams = {"max_depth":5,
               "subsample":0.8,
               "num_round":600,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'multi:softmax',
               "num_class":len(le.classes_),
#                "smdebug_path":f"s3://{bucket}/{prefix}/debug",
#                "smdebug_collections":"metrics,feature_importance"
              }

entry_point_script = "xgboost_dest_prediction.py"

trial = Trial.create(trial_name="framework-mode-trial-{}".format(time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())), 
                     experiment_name=destination_prediction_experiment.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

framework_xgb = XGBoost(
                      entry_point=entry_point_script,
                      role=sagemaker.get_execution_role(),
                      framework_version="0.90-2",
                      py_version="py3",
                      hyperparameters=hyperparams,
                      instance_count=1, 
                      instance_type='ml.m4.xlarge',
                      output_path='s3://{}/{}/output'.format(bucket, prefix),
                      base_job_name="demo-xgboost-destination-prediction",
                      sagemaker_session=sm_sess,
                      rules=debug_rules,
                      use_spot_instances = True,
                      max_run = 3600,
                      max_wait = 3600,
                      input_mode = 'File',
                    )

framework_xgb.fit({'train': s3_input_train,
                   'validation': s3_input_validation}, 
                  experiment_config={
                      "ExperimentName": destination_prediction_experiment.experiment_name, 
                      "TrialName": trial.trial_name,
                      "TrialComponentDisplayName": "Training",
                  })


2. Deploy


data_capture_prefix = '{}/datacapture'.format(prefix)
endpoint_name = "demo-xgboost-destination-prediction-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print("EndpointName = {}".format(endpoint_name))

xgb_predictor = framework_xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=100,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )
3. Invoke 

xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()
xgb_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

_df[['day_num', 'x_start', 'y_start', 'z_start']].to_csv("test.csv",index=False,header=False)
sm = boto3.Session().client(service_name="runtime.sagemaker")
test_sample = [line.strip("\n") for line in open("test.csv")]
test_sample = test_sample[:10]
print("xgb_predictor.endpoint_name, ",xgb_predictor.endpoint_name,)

for sample_ in test_sample:
    sample = bytes(sample_,'utf-8')
    
    response = sm.invoke_endpoint(
        EndpointName = xgb_predictor.endpoint_name,
        ContentType = "text/csv",
        Body=sample

    )
    print(response["Body"].read())
    response_ = xgb_predictor.predict(data=sample_)
#     time.sleep(0.5)
#     print(type(response_))


4. update datacapture config

endpoint_name = xgb_predictor.endpoint_name
s3_capture_upload_path = 's3://{}/{}'.format(bucket, data_capture_prefix)
print("endpoint_name ",endpoint_name)
print("s3_capture_upload_path ",s3_capture_upload_path)

from sagemaker.model_monitor import DataCaptureConfig
from sagemaker.predictor import Predictor
from sagemaker import session
import boto3
sm_session = session.Session(boto3.Session())

# Change parameters as you would like - adjust sampling percentage, 
#  chose to capture request or response or both.
#  Learn more from our documentation
data_capture_config = DataCaptureConfig(
                        enable_capture = True,
                        sampling_percentage=50,
                        destination_s3_uri=s3_capture_upload_path,
                        kms_key_id=None,
#                         capture_options=["REQUEST", "RESPONSE"],
                        csv_content_types=["text/csv"]
                      )

# Now it is time to apply the new configuration and wait for it to be applied
predictor = Predictor(endpoint_name)
predictor.serializer = sagemaker.serializers.CSVSerializer()
predictor.update_data_capture_config(data_capture_config=data_capture_config)
sm_session.wait_for_endpoint(endpoint_name)


5. Create Baseline stats

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker import get_execution_role

role = get_execution_role()

my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri+'/train.csv',
    dataset_format=DatasetFormat.csv(header=False),
    output_s3_uri=baseline_results_uri,
    wait=True
)


6. Enable Real time monitor

from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

print("predictor.endpoint_name   ", predictor.endpoint_name)
print("xgb_predictor.endpoint_name   ", xgb_predictor.endpoint_name)

mon_schedule_name = f'xgboost-dest-prediction-model-monitor-{time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())}'
s3_report_path = f's3://{bucket}/{prefix}/model-monitor-output'
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=predictor.endpoint_name,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,

)


7. Test monitoring executions 

client = boto3.client('sagemaker')
client.list_monitoring_executions(MonitoringScheduleName='xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14', MaxResults=3)


Results

{'MonitoringExecutionSummaries': [{'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14',
   'ScheduledTime': datetime.datetime(2020, 9, 17, 15, 0, tzinfo=tzlocal()),
   'CreationTime': datetime.datetime(2020, 9, 17, 15, 7, 45, 454000, tzinfo=tzlocal()),
   'LastModifiedTime': datetime.datetime(2020, 9, 17, 15, 7, 48, 325000, tzinfo=tzlocal()),
   'MonitoringExecutionStatus': 'Failed',
   'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
   'FailureReason': 'No S3 objects found under S3 URL "s3://telenavsearch-sagemaker/xgboost-tirps/datacapture/demo-xgboost-destination-prediction-2020-09-16-11-17-58/AllTraffic/2020/09/17/14" given in input data source. Please ensure that the bucket exists in the selected region (us-west-2), that objects exist under that S3 prefix, and that the role "arn:aws:iam::990360540682:role/service-role/AmazonSageMaker-ExecutionRole-20200914T104925" has "s3:ListBucket" permissions on bucket "telenavsearch-sagemaker".'},
  {'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14',
   'ScheduledTime': datetime.datetime(2020, 9, 17, 14, 0, tzinfo=tzlocal()),
   'CreationTime': datetime.datetime(2020, 9, 17, 14, 7, 45, 760000, tzinfo=tzlocal()),
   'LastModifiedTime': datetime.datetime(2020, 9, 17, 14, 12, 29, 898000, tzinfo=tzlocal()),
   'MonitoringExecutionStatus': 'Failed',
   'ProcessingJobArn': 'arn:aws:sagemaker:us-west-2:990360540682:processing-job/model-monitoring-202009171400-25e5856d1af82a1b5bcecf8c',
   'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
   'FailureReason': 'AlgorithmError: See job logs for more information'},

@icywang86rui
Copy link
Contributor

I have noticed that when the endpoint was deployed the csv_content_types is not specified in your DataCaptureConfig. Have you tried adding it?

xgb_predictor = framework_xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=100,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )

@tiru1930
Copy link
Author

tiru1930 commented Sep 19, 2020 via email

@icywang86rui
Copy link
Contributor

I see that you have configured DataCaptureConfig but csv_content_types is missing here.

@lsabreu96
Copy link

@tiru1930 , Did you have any progress here ?
I'm having the same issue, but my error says that the encodingInput is JSON and the encoding Output is CSV. I've tried setting csv_content_types for both text/csv and for application/json, but none worked.

@tiru1930
Copy link
Author

tiru1930 commented Jan 21, 2021 via email

@lsabreu96
Copy link

lsabreu96 commented Jan 21, 2021

@tiru1930 , sorry for bothering you again, but I couldn't find anywhere in our code, but did you change somehow how XGBoost is returning your data ?
Reading the docs I've found that it outputs the predictions as a CSV, but your error seems to state the output is JSON encoded

@lsabreu96
Copy link

lsabreu96 commented Jan 21, 2021

@icywang86rui , could you help us out here, please ?
I've deployed my model again, this time setting the serializer and deserialzier to be the same, expecting the error to go away, but that didn't work.

@aws aws locked and limited conversation to collaborators May 20, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

5 participants