Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Issue with passing trace_entity when using ThreadPoolExecutor in lambda #342

Open
RossHammer opened this issue Jun 20, 2022 · 4 comments
Open

Comments

@RossHammer
Copy link

I followed the instructions in the readme for passing the trace entity in a ThreadPoolExecutor but this does not appear to work properly in a lambda. The first time a thread does some work the trace_entity does not stick and the subsegment gets put under the root node not the one it should be assiciated with. The next time a thread picks up a job everything seems to operate fine.

I dug into the code a bit and it seems like there is a custome context class for lambdas but it does not override set_trace_entity to work properly with the other changes in the class. To workaround the issue I have found calling get_trace_entity before setting sets the internal state up properly.

example workaround from readme

def load_url(url, trace_entity):
    xray_recorder.get_trace_entity() # Workaround to setup internal state
    xray_recorder.set_trace_entity(trace_entity)

    resp = requests.get(url)

    xray_recorder.clear_trace_entities()
    return resp
@NathanielRN
Copy link
Contributor

Hey there! I had some follow up questions:

the trace_entity does not stick

Can you explain this more please? Do you mean that when you follow our documentation on using ThreadPoolExecutor the call to current_entity = xray_recorder.get_trace_entity() returns None? Or is invalid?

The next time a thread picks up a job everything seems to operate fine.

Hm this is curious. Can you please add screenshots of what you're seeing? Especially before and after screenshots and an explanation of what you expect to see versus what you actually see?

When our documentation says # Get the current active segment or subsegment from the main thread. we expect one to be already open. If you share your code you can show us how you already set the segment that the subsequent line expects to find. Otherwise maybe your segments are missing a trace on the first run?

a custome context class for lambdas but it does not override set_trace_entity to work properly with the other changes in the class.

Which changes are missing to make it compatible with the other changes? Just curious in case you already know. Also please feel free to open a PR to add a fix if you have one and we'll be happy to answer!

To workaround the issue I have found calling get_trace_entity before setting sets the internal state up properly.

That's awesome that you have a workaround! Is there a chance you missed the current_entity = xray_recorder.get_trace_entity() line in our documentation example that I linked above?

Perhaps your segment isn't already available when your load_url function executes, or maybe it always shows up late and that's why the first one doesn't work but the subsequent ones do?

The best way to help you would be to have you share your code so we can see when the trace that get_trace_entity() fetches is created and if we can confirm that it will be available by they this function is called.

Please let me know if you have any questions!

@RossHammer
Copy link
Author

I can try and get a sample together in the next week or so but it should be doable with what is posted in the readme . In the console the first segments recorded by a thread in the pool show up under the invoke of the lambda not the segment from the trace entity if it should be redording under another subsegment.

It looks like the issue coming from here. It seems like the segment is not set in the local context and the set_trace_entity function does not set it. In turn when recording the next segment it does not have the proper parent element. get_trace_entity before setting it fixes everything becasue the segment will get set.

@NathanielRN
Copy link
Contributor

Thanks for your deep dive!

In the console the first segments recorded by a thread in the pool show up under the invoke of the lambda not the segment from the trace entity if it should be redording under another subsegment.

Why does it work on subsequent calls then? Is it because set_trace_entity calls setattr(self._local, 'entities', [trace_entity]) and so the subsegments fall under each other synchronously?

This is where a picture of what you see versus what you expect would be useful to clarify, but I think this is what you mean and would explain why it works on the 2nd time onward.

It seems like the segment is not set in the local context and the set_trace_entity function does not set it.

So are you saying we could fix the issue by overriding the set_trace_entity method to also _refresh_context like get_trace_entity in the lambda_launcher.py should solve this?

def set_trace_entity(self, trace_entity):
        """
        Refresh the context before setting the trace entity so
        that the correct parent is set.
        """
        self._refresh_context()
        super.set_trace_entity(trace_entity)

That case would make sense to me, and we could move to a PR for this?

By the way, we are also recommending users to use the OpenTelemetry Python SDK. It has tons of features and I found an example with ThreadPoolExecutor to ensure spans are parented correctly. You'll find lots of support there and AWS announced GA support for its traces 1 year ago.

@RossHammer
Copy link
Author

We are using this library becasue it is what AWS lambda powertools is using under the covers. The only spot we are directly using this library is to make things work properly in the threadpool.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants