Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Possible lifetime issue with parallel_scan over non trivially destructible range #1390

Open
coo1berg opened this issue May 29, 2024 · 4 comments
Labels

Comments

@coo1berg
Copy link

Hi,

I've faced odd crash using parallel_scan over range object that has some logic inside its destructor. It seems that destructor of range sometimes called on not yet constructed range object.

It happens not 100% times, but there is a minimal reproducible example I managed to get (ensure asserts are on):

#include <cassert>
#include <vector>

#include <tbb/parallel_reduce.h>
#include <tbb/parallel_scan.h>

int main()
{
	std::vector <int> vec;
	const int len = 34479;

	struct non_trivial_range : tbb::blocked_range <decltype(vec)::iterator>
	{
		using non_trivial_range::blocked_range::blocked_range;
		
		~non_trivial_range()
		{
			assert( cookie == 0xc001c001 );
			cookie = 0xdeadbeef;
		}
		unsigned cookie = 0xc001c001;
	};

	vec.assign( len, 1 );
	
	// this works fine
	int s = tbb::parallel_reduce( non_trivial_range{ vec.begin(), vec.end() }, 0, [&]( const auto & rng, int acc )
	{
		for (int p : rng) acc += p; // work is irrelevant

		return acc;
	}, std::plus<>() );
	assert( s == len );

	// this fails eventually
	for (int i = 0; i < 1000; ++i)
	{
		s = tbb::parallel_scan( non_trivial_range{ vec.begin(), vec.end() }, 0, [&]( const auto & rng, int acc, auto is_final )
		{
			for (int p : rng) acc += p; // work is irrelevant

			return acc;
		}, std::plus<>() );
		assert( s == len );
	}
	return s;
}

tbb.cpp.txt

Tested on:

  • latest msvc x64, tbb-2021.11
  • gcc-13 x64, tbb-2021.12

Is there a problem inside my code?

Regards,
Slava

@dnmokhov dnmokhov added the bug label May 29, 2024
@dnmokhov
Copy link
Contributor

I can reproduce the issue. It does appear that sometimes the right zombie child self destructs without being fully constructed.

@coo1berg
Copy link
Author

If that helps, here is a little patch that fixes lifetime problem for me

diff --git a/include/oneapi/tbb/parallel_scan.h b/include/oneapi/tbb/parallel_scan.h
index d624f7eb..b9169b97 100644
--- a/include/oneapi/tbb/parallel_scan.h
+++ b/include/oneapi/tbb/parallel_scan.h
@@ -94,6 +94,7 @@ private:
 
     wait_context& m_wait_context;
     sum_node_type* m_parent = nullptr;
+    bool m_alive = false;
 public:
     small_object_allocator m_allocator;
     final_sum( Body& body, wait_context& w_o, small_object_allocator& alloc ) :
@@ -107,13 +108,15 @@ public:
     }
 
     ~final_sum() {
-        m_range.begin()->~Range();
+        if (m_alive)
+            m_range.begin()->~Range();
     }
     void finish_construction( sum_node_type* parent, const Range& range, Body* stuff_last ) {
         __TBB_ASSERT( m_parent == nullptr, nullptr );
         m_parent = parent;
         new( m_range.begin() ) Range(range);
         m_stuff_last = stuff_last;
+        m_alive = true;
     }
 private:
     sum_node_type* release_parent() {

Of course, it doesn't fix the cause of why it wants to self destruct at wrong time.

@dnmokhov
Copy link
Contributor

We are still looking into why the object is left uninitialized. Meanwhile, I hope that you are able to use the workaround (even a simple if (m_parent) check would work).

@coo1berg
Copy link
Author

Hello,

Update on issue status so far:

  1. A hint for simpler patch from @dnmokhov is indeed works fine.
  2. There is another problem discovered after thorough testing.

Looks like tbb::parallel_scan can sometimes return while some spawned Body instances are still in cleanup phase. This causes race condition if destructor of Body interacts with some shared state, and that state is altered in code after parallel_scan. In particular, sporadic crashes or hangs were observed when Body::~Body frees memory from memory_pool-like resource, that has been recycled right after call to parallel_scan.

I've traced the cause down to final_sum<>::finalize method. If I force destruction of final_sum<>::m_body before calling final_sum<>::release_parent - the issue goes away. But I'm not 100% convinced in that fix.

Here is slightly reworked example, that can now detect these issues:

BOOST_AUTO_TEST_CASE( tbb_scan )
{
    std::vector <int> vec;
    const int len = 34479;

    static std::atomic_int rng_error = 0;
    static std::atomic_int obj_error = 0;
    static std::atomic_int obj_count = 0;
    struct non_trivial_range : tbb::blocked_range <decltype(vec)::iterator>
    {
        using non_trivial_range::blocked_range::blocked_range;

        ~non_trivial_range()
        {
            if (cookie != 0xFEE1C001)
                ++rng_error;
            std::this_thread::yield(); // imitate some cleanup procedure
            cookie = 0xDEADBEEF;
        }
        unsigned cookie = 0xFEE1C001;
    };
    struct non_trivial_scanner
    {
        non_trivial_scanner() { ++obj_count; }
        non_trivial_scanner( non_trivial_scanner &, tbb::split ) noexcept { ++obj_count; }
        non_trivial_scanner( non_trivial_scanner const & ) = delete;

        ~non_trivial_scanner()
        {
            if (cookie != 0xFEE1600D)
                ++obj_error;
            std::this_thread::yield(); // imitate some cleanup procedure
            cookie = 0xDEADF00D;
            --obj_count;
        }
        unsigned cookie = 0xFEE1600D;
        int sum = 0;

        void operator()( const non_trivial_range & r, tbb::final_scan_tag ) { for (int p : r) sum += p; }
        void operator()( const non_trivial_range & r, tbb::pre_scan_tag ) { for (int p : r) sum += p; }

        void reverse_join( non_trivial_scanner & lhs ) { sum += lhs.sum; }
        void assign( non_trivial_scanner & other ) { sum = other.sum; }
    };

    vec.assign( len, 1 );

    for (int i = 0; i < 100'000; ++i) BOOST_TEST_CONTEXT( "run " << i )
    {
        non_trivial_scanner nts;
        BOOST_REQUIRE_EQUAL( obj_count, 1 );
        tbb::parallel_scan( non_trivial_range{ vec.begin(), vec.end() }, nts );
        const int oc = obj_count.load();

        BOOST_REQUIRE_EQUAL( oc, 1 );
        BOOST_REQUIRE_EQUAL( obj_error, 0 );
        BOOST_REQUIRE_EQUAL( rng_error, 0 );
        BOOST_REQUIRE_EQUAL( nts.sum, len );

        if ((i % 1000) == 0)
            std::cout << i << std::endl;
    }
}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants