Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

generate error/fault metrics by aws sdk status code #924

Merged
merged 1 commit into from
Jul 3, 2023

Conversation

scaugrated
Copy link
Contributor

@scaugrated scaugrated commented Jun 15, 2023

Description:

A short term solution to generate error/fault metrics by aws sdk status code.

When the AWS SDK calls an API then returns non-200 status code, the AWS SDK simply throws an exception. The exception thrown by the AWS SDK is stored within the produced spans and is accessible in the AwsSpanMetricsProcessor, where we generate Fault/Error metrics.

We attempt to get the throwable object and call its getStatusCode method to obtain the status code, then generate metrics based on the corresponding status code.

Existing Issue(s):

http.status_code attribute in span was not being populated when AWS APIs were returning non-200 status codes

Testing:

unit tests

Documentation:
N/A

Outstanding items:
N/A

@scaugrated scaugrated requested a review from a team June 15, 2023 22:04
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 15, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@@ -56,6 +61,18 @@ private enum ExpectedStatusMetric {

private AwsSpanMetricsProcessor awsSpanMetricsProcessor;

static class ThrowableWithMethodGetStatusCode extends Throwable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a test covering a Throwable with a statusCode method and a Throwable with neither statusCode nor getStatusCode methods.

return;
httpStatusCode = getAwsStatusCode(spanData);

if (httpStatusCode == null || httpStatusCode <= 0L) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range of HTTP status codes is defined in https://datatracker.ietf.org/doc/html/rfc9110#name-status-codes and could be checked as well.

All valid status codes are within the range of 100 to 599, inclusive.

Copy link

@vastin vastin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scaugrated scaugrated requested a review from thpierce June 23, 2023 15:21
Copy link
Contributor

@thpierce thpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scaugrated
Copy link
Contributor Author

Hi @trask , all other approval requirements met, please take a look when you have a chance.

@scaugrated
Copy link
Contributor Author

Hi @jack-berg , all other approval requirements met, please take a look when you have a chance.

@Nullable
private static Long getAwsStatusCode(SpanData spanData) {
InstrumentationScopeInfo scopeInfo = spanData.getInstrumentationScopeInfo();
if (scopeInfo == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary as spanData.getInstrumentationScopeInfo() is not annotated with @Nullable and does not return null.

}

String scopeName = scopeInfo.getName();
if (scopeName == null || !scopeName.contains("aws-sdk")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no null check required for scopeInfo.getName().

Throwable throwable = exceptionEvent.getException();

try {
Method method = throwable.getClass().getMethod("getStatusCode", new Class<?>[] {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code owners for this are @wangzlei and @srprash. Seems like a bad idea to do reflection on the hot path.

The javadoc for the method above mentions that this is a short term fix until this issue is resolved. I won't block this if the code owners want to proceed with this solution, but it would be nice to see a solution proposed to address the root of the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this short term solution is not prefect but acceptable from our side, @wangzlei and @srprash have approved this change. We will deprecate this solution while delivering long term solution.

@scaugrated scaugrated force-pushed the error_fault_metrics branch from 527e00c to 352dbc5 Compare June 27, 2023 23:45
@scaugrated scaugrated requested a review from jack-berg June 29, 2023 17:03
@scaugrated
Copy link
Contributor Author

Hi @trask , @jack-berg , could you please take a look when you have a chance.

@trask trask merged commit d9aaf2d into open-telemetry:main Jul 3, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants