MonitoringExecutionSummaries list empty after the endpoint envocation #1896

tiru1930 · 2020-09-16T13:13:02Z

I have created deployed xgboost framework service, i was able to hit the endpoint with sagemaker runtime session, Now i have updated the xgboost endpoint to capture the i/o data, and created base line and Scheduled monitor job. After that i have invoked the service but monitoring execution summaries is empty, an i have not seen any s3 bucket also.

To reproduce
1.Create xgboost framework deployment.
2.updated endpoint with data capture logs
3.Create base line
4.schedule monitor job
5.invoke the endpoint

Expected behavior
I should get monitoring results

Screenshots or logs

{'MonitoringScheduleSummaries': [{'CreationTime': datetime.datetime(2020, 9, 16, 11, 41, 56, 284000, tzinfo=tzlocal()),
                                  'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
                                  'LastModifiedTime': datetime.datetime(2020, 9, 16, 11, 41, 59, 295000, tzinfo=tzlocal()),
                                  'MonitoringScheduleArn': 'arn:aws:sagemaker:us-west-2:990360540682:monitoring-schedule/xgboost-dest-prediction-model-monitor-2020-09-16-11-41-56',
                                  'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-11-41-56',
                                  'MonitoringScheduleStatus': 'Scheduled'}],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '445',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 16 Sep 2020 12:18:44 GMT',
                                      'x-amzn-requestid': 'a2f5fcb4-a12c-481d-88a4-ff9dbeb5bbd2'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a2f5fcb4-a12c-481d-88a4-ff9dbeb5bbd2',
                      'RetryAttempts': 0}}

results

{'MonitoringExecutionSummaries': [],
 'ResponseMetadata': {'RequestId': '43e86dab-c257-49ce-9f1b-09d2f6ea4463',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '43e86dab-c257-49ce-9f1b-09d2f6ea4463',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '35',
   'date': 'Wed, 16 Sep 2020 12:19:23 GMT'},
  'RetryAttempts': 0}}

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.6
Framework name (eg. PyTorch) or algorithm (eg. KMeans): xgboost frame work
Framework version: 0.90-2
Python version: 3.8
CPU or GPU: CPU
Custom Docker image (Y/N): N

The text was updated successfully, but these errors were encountered:

tiru1930 · 2020-09-17T08:37:50Z

now i can see cloud watch logs on hourly interval

but i got below error

Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.	2020-09-17 08:12:36 ERROR Main:80 - Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.

icywang86rui · 2020-09-17T20:04:35Z

@tiru1930 could you provide the code you used to produce this error here?

tiru1930 · 2020-09-18T06:15:39Z

@icywang86rui Here is code

1. Model Training 

from sagemaker import image_uris

region = sess.region_name
container = image_uris.retrieve('xgboost',region,version='latest')
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.TrainingInput(s3_data='s3://{}/{}/train/'.format(bucket, prefix), content_type='csv')

sm_sess = sagemaker.session.Session()

create_date = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
destination_prediction_experiment = Experiment.create(experiment_name="destination-lat-lon-prediction-xgboost-{}".format(create_date), 
                                              description="Using xgboost to predict end location lat and lon", 
                                              sagemaker_boto_client=boto3.client('sagemaker'))

from sagemaker.xgboost import XGBoost

hyperparams = {"max_depth":5,
               "subsample":0.8,
               "num_round":600,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'multi:softmax',
               "num_class":len(le.classes_),
#                "smdebug_path":f"s3://{bucket}/{prefix}/debug",
#                "smdebug_collections":"metrics,feature_importance"
              }

entry_point_script = "xgboost_dest_prediction.py"

trial = Trial.create(trial_name="framework-mode-trial-{}".format(time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())), 
                     experiment_name=destination_prediction_experiment.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

framework_xgb = XGBoost(
                      entry_point=entry_point_script,
                      role=sagemaker.get_execution_role(),
                      framework_version="0.90-2",
                      py_version="py3",
                      hyperparameters=hyperparams,
                      instance_count=1, 
                      instance_type='ml.m4.xlarge',
                      output_path='s3://{}/{}/output'.format(bucket, prefix),
                      base_job_name="demo-xgboost-destination-prediction",
                      sagemaker_session=sm_sess,
                      rules=debug_rules,
                      use_spot_instances = True,
                      max_run = 3600,
                      max_wait = 3600,
                      input_mode = 'File',
                    )

framework_xgb.fit({'train': s3_input_train,
                   'validation': s3_input_validation}, 
                  experiment_config={
                      "ExperimentName": destination_prediction_experiment.experiment_name, 
                      "TrialName": trial.trial_name,
                      "TrialComponentDisplayName": "Training",
                  })


2. Deploy


data_capture_prefix = '{}/datacapture'.format(prefix)
endpoint_name = "demo-xgboost-destination-prediction-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print("EndpointName = {}".format(endpoint_name))

xgb_predictor = framework_xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=100,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )
3. Invoke 

xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()
xgb_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

_df[['day_num', 'x_start', 'y_start', 'z_start']].to_csv("test.csv",index=False,header=False)
sm = boto3.Session().client(service_name="runtime.sagemaker")
test_sample = [line.strip("\n") for line in open("test.csv")]
test_sample = test_sample[:10]
print("xgb_predictor.endpoint_name, ",xgb_predictor.endpoint_name,)

for sample_ in test_sample:
    sample = bytes(sample_,'utf-8')
    
    response = sm.invoke_endpoint(
        EndpointName = xgb_predictor.endpoint_name,
        ContentType = "text/csv",
        Body=sample

    )
    print(response["Body"].read())
    response_ = xgb_predictor.predict(data=sample_)
#     time.sleep(0.5)
#     print(type(response_))


4. update datacapture config

endpoint_name = xgb_predictor.endpoint_name
s3_capture_upload_path = 's3://{}/{}'.format(bucket, data_capture_prefix)
print("endpoint_name ",endpoint_name)
print("s3_capture_upload_path ",s3_capture_upload_path)

from sagemaker.model_monitor import DataCaptureConfig
from sagemaker.predictor import Predictor
from sagemaker import session
import boto3
sm_session = session.Session(boto3.Session())

# Change parameters as you would like - adjust sampling percentage, 
#  chose to capture request or response or both.
#  Learn more from our documentation
data_capture_config = DataCaptureConfig(
                        enable_capture = True,
                        sampling_percentage=50,
                        destination_s3_uri=s3_capture_upload_path,
                        kms_key_id=None,
#                         capture_options=["REQUEST", "RESPONSE"],
                        csv_content_types=["text/csv"]
                      )

# Now it is time to apply the new configuration and wait for it to be applied
predictor = Predictor(endpoint_name)
predictor.serializer = sagemaker.serializers.CSVSerializer()
predictor.update_data_capture_config(data_capture_config=data_capture_config)
sm_session.wait_for_endpoint(endpoint_name)


5. Create Baseline stats

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker import get_execution_role

role = get_execution_role()

my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri+'/train.csv',
    dataset_format=DatasetFormat.csv(header=False),
    output_s3_uri=baseline_results_uri,
    wait=True
)


6. Enable Real time monitor

from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

print("predictor.endpoint_name   ", predictor.endpoint_name)
print("xgb_predictor.endpoint_name   ", xgb_predictor.endpoint_name)

mon_schedule_name = f'xgboost-dest-prediction-model-monitor-{time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())}'
s3_report_path = f's3://{bucket}/{prefix}/model-monitor-output'
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=predictor.endpoint_name,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,

)


7. Test monitoring executions 

client = boto3.client('sagemaker')
client.list_monitoring_executions(MonitoringScheduleName='xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14', MaxResults=3)


Results

{'MonitoringExecutionSummaries': [{'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14',
   'ScheduledTime': datetime.datetime(2020, 9, 17, 15, 0, tzinfo=tzlocal()),
   'CreationTime': datetime.datetime(2020, 9, 17, 15, 7, 45, 454000, tzinfo=tzlocal()),
   'LastModifiedTime': datetime.datetime(2020, 9, 17, 15, 7, 48, 325000, tzinfo=tzlocal()),
   'MonitoringExecutionStatus': 'Failed',
   'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
   'FailureReason': 'No S3 objects found under S3 URL "s3://telenavsearch-sagemaker/xgboost-tirps/datacapture/demo-xgboost-destination-prediction-2020-09-16-11-17-58/AllTraffic/2020/09/17/14" given in input data source. Please ensure that the bucket exists in the selected region (us-west-2), that objects exist under that S3 prefix, and that the role "arn:aws:iam::990360540682:role/service-role/AmazonSageMaker-ExecutionRole-20200914T104925" has "s3:ListBucket" permissions on bucket "telenavsearch-sagemaker".'},
  {'MonitoringScheduleName': 'xgboost-dest-prediction-model-monitor-2020-09-16-14-18-14',
   'ScheduledTime': datetime.datetime(2020, 9, 17, 14, 0, tzinfo=tzlocal()),
   'CreationTime': datetime.datetime(2020, 9, 17, 14, 7, 45, 760000, tzinfo=tzlocal()),
   'LastModifiedTime': datetime.datetime(2020, 9, 17, 14, 12, 29, 898000, tzinfo=tzlocal()),
   'MonitoringExecutionStatus': 'Failed',
   'ProcessingJobArn': 'arn:aws:sagemaker:us-west-2:990360540682:processing-job/model-monitoring-202009171400-25e5856d1af82a1b5bcecf8c',
   'EndpointName': 'demo-xgboost-destination-prediction-2020-09-16-11-17-58',
   'FailureReason': 'AlgorithmError: See job logs for more information'},

icywang86rui · 2020-09-18T21:22:13Z

I have noticed that when the endpoint was deployed the csv_content_types is not specified in your DataCaptureConfig. Have you tried adding it?

xgb_predictor = framework_xgb.deploy(initial_instance_count=1, 
                           instance_type='ml.m4.xlarge',
                           endpoint_name=endpoint_name,
                           data_capture_config=DataCaptureConfig(enable_capture=True,
                                                                 sampling_percentage=100,
                                                                 destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix)
                                                                )
                           )

tiru1930 · 2020-09-19T00:49:16Z

Yes I did

…

On Sat, 19 Sep 2020, 02:52 icywang86rui, ***@***.***> wrote: I have noticed that when the endpoint was deployed the csv_content_types is not specified in your DataCaptureConfig. Have you tried adding it? xgb_predictor = framework_xgb.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge', endpoint_name=endpoint_name, data_capture_config=DataCaptureConfig(enable_capture=True, sampling_percentage=100, destination_s3_uri='s3://{}/{}'.format(bucket, data_capture_prefix) ) ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1896 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC5FIV7KN4OPGKI5IZ7FWL3SGPFRFANCNFSM4RO2BOFA> .

icywang86rui · 2020-09-21T16:12:43Z

I see that you have configured DataCaptureConfig but csv_content_types is missing here.

lsabreu96 · 2021-01-21T13:20:41Z

@tiru1930 , Did you have any progress here ?
I'm having the same issue, but my error says that the encodingInput is JSON and the encoding Output is CSV. I've tried setting csv_content_types for both text/csv and for application/json, but none worked.

tiru1930 · 2021-01-21T15:05:38Z

no luck , @ lsabreu96

…

On Thu, Jan 21, 2021 at 6:50 PM lsabreu96 ***@***.***> wrote: @tiru1930 <https://github.com/tiru1930> , Did you have any progress here ? I'm having the same issue, but my error says that the encodingInput is JSON and the encoding Output is CSV. I've tried setting csv_content_types for both text/csv and for application/json, but none worked. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1896 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC5FIV6Y6CQGGSR3NS7VK7TS3AS3TANCNFSM4RO2BOFA> .

lsabreu96 · 2021-01-21T21:45:05Z

@tiru1930 , sorry for bothering you again, but I couldn't find anywhere in our code, but did you change somehow how XGBoost is returning your data ?
Reading the docs I've found that it outputs the predictions as a CSV, but your error seems to state the output is JSON encoded

lsabreu96 · 2021-01-21T22:20:46Z

@icywang86rui , could you help us out here, please ?
I've deployed my model again, this time setting the serializer and deserialzier to be the same, expecting the error to go away, but that didn't work.

chuyang-deng added the type: question label Sep 22, 2020

bveeramani closed this as completed May 20, 2021

aws locked and limited conversation to collaborators May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

tiru1930 commented Sep 16, 2020

tiru1930 commented Sep 17, 2020

icywang86rui commented Sep 17, 2020

tiru1930 commented Sep 18, 2020

icywang86rui commented Sep 18, 2020

tiru1930 commented Sep 19, 2020 via email

icywang86rui commented Sep 21, 2020

lsabreu96 commented Jan 21, 2021

tiru1930 commented Jan 21, 2021 via email

lsabreu96 commented Jan 21, 2021 •

edited

Loading

lsabreu96 commented Jan 21, 2021 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

MonitoringExecutionSummaries list empty after the endpoint envocation #1896

Comments

tiru1930 commented Sep 16, 2020

tiru1930 commented Sep 17, 2020

icywang86rui commented Sep 17, 2020

tiru1930 commented Sep 18, 2020

icywang86rui commented Sep 18, 2020

tiru1930 commented Sep 19, 2020 via email

icywang86rui commented Sep 21, 2020

lsabreu96 commented Jan 21, 2021

tiru1930 commented Jan 21, 2021 via email

lsabreu96 commented Jan 21, 2021 • edited Loading

lsabreu96 commented Jan 21, 2021 • edited Loading

This issue was moved to a discussion.

lsabreu96 commented Jan 21, 2021 •

edited

Loading

lsabreu96 commented Jan 21, 2021 •

edited

Loading