Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Panic with follows from span in multithreaded runtime. #183

Open
mTsBucy1 opened this issue Nov 7, 2024 · 0 comments
Open

Panic with follows from span in multithreaded runtime. #183

mTsBucy1 opened this issue Nov 7, 2024 · 0 comments

Comments

@mTsBucy1
Copy link
Contributor

mTsBucy1 commented Nov 7, 2024

Bug Report

Version

tracing-opentelemetry: 0.25.0 / 0.27.0 (elaborated below),
tracing: 0.1.40

Platform

Linux mtstestclient03 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Description

I have this scenario in my code:

  1. I spawn (multiple) task, which loops and call async instrumented function.
  2. this function gets its tracing::Span::current().id() and sends it through a tokio mpsc channel.
  3. Another task reads this, spawns a instrumented async function, which follows from the received id.

Suddenly my code paniced on startup with

thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-opentelemetry-0.25.0/src/layer.rs:953:18:
Missing otel data span extensions
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/registry/sharded.rs:437:58:
Mutex poisoned: PoisonError { .. }
[stack trace...]
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.

This happened here in the code https://github.com/tokio-rs/tracing-opentelemetry/blob/v0.25.0/src/layer.rs#L953.
My hourly job crashed 4 times this way in the past week. I.e. approximately every 40th time.

I wrote a test, now on the current branch (0.27.0), to try to replicate this behaviour, but it also didn't show the same panic. However, I noticed another odd behaviour: The asserted count fails for a tokio new_multi_thread runtime, but succeeds with the tokio new_current_thread one. I am not sure whether these are indeed related, but it seems to be a bug in its own right.

// in tests/follows_from.rs

#[test]
fn trace_follows_from_diff_thread() {
    use tokio::sync::mpsc;

    #[tracing::instrument]
    async fn span_follows_this(chan: &mpsc::Sender<Option<tracing::Id>>) {
        chan.send(tracing::Span::current().id())
            .await
            .expect("Channel should be open");
    }

    #[tracing::instrument(follows_from = [cause])]
    async fn span(cause: Option<tracing::Id>) {}

    async fn receiver(mut chan: mpsc::Receiver<Option<tracing::Id>>) {
        while let Some(cause) = chan.recv().await {
            span(cause).await;
        }
    }

    let (_tracer, provider, exporter, subscriber) = test_tracer();

    tracing::subscriber::with_default(subscriber, || {
        tokio::runtime::Builder::new_multi_thread()
            .enable_all()
            .build()
            .unwrap()
            .block_on(async {
                let (tx, rx) = mpsc::channel(4);
                tokio::spawn(receiver(rx));
                span_follows_this(&tx).await;
                tokio::time::sleep(std::time::Duration::from_millis(10)).await;
                drop(tx);
            })
    });

    drop(provider); // flush all spans
    let spans = exporter.0.lock().unwrap();
    // The follower and followee
    assert_eq!(spans.len(), 2);
}
@mTsBucy1 mTsBucy1 changed the title Follows from span not tracked in multithreaded runtime. Panic with follows from span in multithreaded runtime. Nov 14, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant