How to pause a span when its Fiber yields? #3026

rmosolgo · 2025-01-21T15:53:27Z

Is your feature request related to a problem? Please describe.

Hi! In GraphQL-Ruby, field execution is instrumented. But, the field can call Fiber.yield to make itself wait for some other work to be done. (The other work is GraphQL::Dataloader.) When that work is done, the field's Fiber is resumed and GraphQL execution continues.

In practice, we end up with several Fibers paused while their fields wait for data. Once the data is available, those Fibers are resumed (one-at-a-time) and field execution completes.

The problem is that the field's span in NewRelic includes the time where it was actually paused (because of Fiber.yield). So although clock time was actually passing, Ruby wasn't doing anything with that Fiber.

Then, in the UI, it looks like these GraphQL fields are taking a loooong time (longer than the request duration, actually) -- but it's because the waiting time is counted for each span that's waiting.

Feature Description

I'd like a way to improve the tracing so that I can "pause" the span when a Fiber yields. Or some other way to eliminate this "double-counting" of wait time?

Describe Alternatives

We could do nothing. In that case, whenever we adopt Dataloader, we get a big (nonsensical) spike in segment time:

We'd have to learn to ignore that input 😿

We could not use Fibers in the code. That's also a possibility -- The GraphQL-Batch library doesn't use fibers -- but honestly, it has the same problem with tracing, but it manifests differently because GraphQL-Batch uses Promises instead of Fiber.yield.

Are there existing options in the NewRelic agent that I could use to improve tracing in this case?

Additional context

Here's a simplified example of what GraphQL-Ruby does:

Tracing work with Fiber.yield

require "bundler/inline"

gemfile do
  gem "newrelic_rpm"
end

def do_something(record)
  puts "Doing: #{record[:name]}"
end


manager = Fiber.new do
  ids_to_fetch = []
  records = {}


  subtask_1 = Fiber.new do
    NewRelic::Agent::MethodTracerHelpers.trace_execution_scoped("Jobs/Job") do
      puts "Requesting 1"
      ids_to_fetch << 1
      Fiber.yield
      do_something(records[1])
    end
  end
  subtask_1.resume

  subtask_2 = Fiber.new do
    NewRelic::Agent::MethodTracerHelpers.trace_execution_scoped("Jobs/Job") do
      puts "Requesting 2"
      ids_to_fetch << 2
      Fiber.yield
      do_something(records[2])
    end
  end
  subtask_2.resume

  puts "Loading data"
  NewRelic::Agent::MethodTracerHelpers.trace_execution_scoped("Database/Fetch") do
    ids_to_fetch.each do |id|
      records[id] = { name: "Job ##{id}" }
    end
  end 
  puts "Resuming jobs"

  # Data is loaded, now resume work:
  subtask_1.resume
  subtask_2.resume
end

puts "Starting..."
manager.resume
puts "...Finished"

# Starting...
# Requesting 1
# Requesting 2
# Loading data
# Resuming jobs
# Doing: Job #1
# Doing: Job #2
# ...Finished

Priority

Really Want 😊

The text was updated successfully, but these errors were encountered:

workato-integration · 2025-01-21T15:53:31Z

https://new-relic.atlassian.net/browse/NR-360990

rmosolgo · 2025-01-23T18:55:14Z

I also noticed that on the Summary view, the "Ruby" portion of our request time is larger than the total request time. It's also growing, perhaps as we adopt this Fiber-based flow:

hannahramadan · 2025-01-24T23:34:36Z

Hi @rmosolgo ! Thank you for opening this issue and providing a great overview of your thoughts.

I can see where you’re coming from with how time is represented here, but the concept of 'pausing' a span isn’t how they were designed to work. Spans are meant to account for total elapsed time, which includes both the time spent actively working and any time spent waiting for external resources or dependencies. Pausing a span would break the idea of continuity and its ability to really represent the full lifecycle of an operation as spans weren’t designed to model "active" versus "idle" time.

As far as existing options in the agent go, you could use custom instrumentation to split spans in two, one recording the time before Fiber.yield and another for after, but I’m not sure that will give you an easier view to work with. Another thought is to use the NewRelic::Agent.add_custom_attributes API to add a custom time attribute to spans to record the actual execution time.

We’re going to keep this issue open as a reminder to discuss internally. In the meantime, we’d love to hear if other community members have thoughts about this.

rmosolgo · 2025-01-25T01:51:23Z

Thanks for getting back to me, @hannahramadan. That makes sense about spans being start-and-finish -- I can't say I've seen any other instrumentation tools with any concept of "pausing." I think it was on the tip of my tongue since Ruby's Fibers work that way.

Another possibility I've been considering is to adopt start_transaction_or_segment + .finish instead of block-based instrumenting. I think that would enable me to split up the operation into multiple spans, like you suggest.

kaylareopelle · 2025-02-04T22:18:43Z

Hi @rmosolgo, thanks for your understanding. If you (or any other community members) find solutions to this problem with existing custom instrumentation APIs, we'd love to hear about them!

rmosolgo added the feature request To tag feature request after Hero Triage for PM to disposition label Jan 21, 2025

github-actions bot added the community To tag external issues and PRs submitted by the community label Jan 21, 2025

chynesNR added the needs review label Jan 27, 2025

rmosolgo closed this as completed Feb 4, 2025

kaylareopelle removed the needs review label Feb 4, 2025

rmosolgo mentioned this issue Feb 17, 2025

Rewrite NewRelicTrace to support fiber stops/starts rmosolgo/graphql-ruby#5240

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pause a span when its Fiber yields? #3026

How to pause a span when its Fiber yields? #3026

rmosolgo commented Jan 21, 2025

workato-integration bot commented Jan 21, 2025

rmosolgo commented Jan 23, 2025

hannahramadan commented Jan 24, 2025

rmosolgo commented Jan 25, 2025

kaylareopelle commented Feb 4, 2025

How to pause a span when its Fiber yields? #3026

How to pause a span when its Fiber yields? #3026

Comments

rmosolgo commented Jan 21, 2025

Is your feature request related to a problem? Please describe.

Feature Description

Describe Alternatives

Additional context

Priority

workato-integration bot commented Jan 21, 2025

rmosolgo commented Jan 23, 2025

hannahramadan commented Jan 24, 2025

rmosolgo commented Jan 25, 2025

kaylareopelle commented Feb 4, 2025