Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

boto3 S3 upload_file method does not have the correct trace context #298

Open
wangzlei opened this issue Jan 27, 2021 · 4 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed triaged

Comments

@wangzlei
Copy link

opentelemetry-instrumentation-botocore bug:
X-Ray spans generated by AWS boto3 S3 upload_file method requests do not have the same trace id as their parent and sibling spans and the parent_id is set to null which causes broken traces.

Steps to reproduce

BotocoreInstrumentor().instrument(tracer_provider=trace.get_tracer_provider())

with tracer.start_as_current_span("foo"):
    s3 = boto3.resource('s3')
    s3.meta.client.upload_file('fromFile', 'myS3Bucket', 'toFile')    # upload a file to s3
    s3.bucket.all()    # list bucket

What is the expected behavior?
upload_file span should be the child of foo, get one trace: (foo -> (s3.upload_file, s3.list))

What is the actual behavior?
Get 2 traces, upload_file is not under foo:
trace1 (foo -> s3.list)
trace2 (s3.upload_file)

Additional context
boto3 s3 upload_file uses multi-part to increase throughput, botocore instrumentation does not cover this special case. Refer an analysis.

@github-actions
Copy link

github-actions bot commented Apr 6, 2021

This issue was marked stale due to lack of activity. It will be closed in 30 days.

@srikanthccv srikanthccv added help wanted Extra attention is needed triaged and removed backlog labels May 30, 2021
@s4ke
Copy link

s4ke commented Jan 5, 2022

Is there any information why this happens? We are currently facing this issue.

@ericksoen
Copy link

Had to debug a similar issue today with the download_file method and wanted to share some of my analysis. I've previously encountered the type of broken trace issue that @wangzlei describes in this issue when working with apps that use multi-threading.

In that scenario, the auto-instrumentation libraries correctly instrument outgoing requests but can't correctly associate the trace context to the main thread. We were able to solve those issues using code similar to the following:

import time
import threading

from opentelemetry import context 

def thread_worker(thread_id, context):
  with tracer.start_as_current_span('thread-worker', context) as span:
    span.set_attribute('thread_id', thread_id)
    time.sleep(0.5)

threads = list()
trace_context = context.get_context()

for i in range(4):
  thread = threading.Thread(target=thread_worker, args = (i, trace_context))
  threads.append(thread)
  thread.start()

for t in threads:
  t.join()

Based on the similar broken trace behavior and the observation that the status code for s3:GetObject executed via download_file returns a 206/Partial Content status code, it seems plausible that the download_file/upload_file makes similar use of background threads to reduce request latency.

Since the spans in question are from auto-instrumented method, there's no way to explicitly pass the trace context the way I did in my sample code.

What we did to try to work around this issue:

  • Add a request/response hook when configuring BotocoreInstrumentor to drop specific spans, e.g., s3:GetObject (issue: outside of the sampler, the opentelemetry API/SDK doesn't provide a method to drop spans)
  • Add a request/response hook to mark the span as not recording since these are not transmitted (issue: the span class provides a read-only is_recording() property but no way to change the value
  • Add a request/response hook to associate the span with the correct parent thread (issue: trace context is immutable)

What we did that actually worked:

  • BotocoreInstrumentor().uninstrument() before invoking s3.download_file followed by BotocoreInstrumentor().instrument() after download completes.
    ** Note: for our use case, removing extraneous, loud, and difficult to understand data was more desirable than seeing the broken traces for 900+ s3:getObject calls
    ** Note: we are still able to observe performance and error codes by wrapping the s3.download_file in its own trace as shown in the next code sample
with tracer.start_as_current_span('download-file-from-s3') as download_span:
    # Note: see GitHub issue https://github.com/open-telemetry/opentelemetry-python-contrib/issues/298
    BotocoreInstrumentor().uninstrument()
    s3.download_file(bucket_name, key_name, output_file_name)
    BotocoreInstrumentor().instrument()
    download_span.set_attributes({
      "s3.bucket_name": bucket_name,
      "s3.key_name": key_name,
    })

@s4ke
Copy link

s4ke commented Oct 14, 2022

Thanks @ericksoen this looks like a practicable work around to put into a library function. Tyvm for this investigation!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working help wanted Extra attention is needed triaged
Projects
None yet
Development

No branches or pull requests

4 participants