Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

erlingrj · 2023-11-25T15:42:09Z

I noticed large values for number of unfreed messages being reported at termination. I found out that the these global variables are accessed without locking from e.g. reaction bodies. Another valid question is if there are more global state which should be protected in the token handling code.

edwardalee

This looks reasonable, though it's unfortunate to grab a lock just to facilitate a warning that shouldn't be necessary if the code is working properly. Also, I can't find anywhere in the code that _lf_count_token_allocations is incremented. Is it only decremented? These revisions seem to only protect alterations of _lf_count_payload_allocations.

erlingrj · 2023-11-25T19:03:41Z

You are probably right about the token_allocations, I didn't check that properly, this was addressing the payload allocations yes. I guess it could be hidden behind a NDEBUG flag so it is only compiled if the user specifies a debug build

erlingrj · 2023-11-26T11:55:36Z

@edwardalee I did the following:

Hide the counting behind #if !defined NDEBUG. This means that it is only compiled when we pick build-type Debug or Test.
I have added the missing increment of _lf_count_token_allocations. I think I added it at the correct place. The count is decremented every time a token is freed (not caring about recycling or not)

erlingrj · 2023-11-26T15:52:57Z

Hmmm, I believe that there still are issues. I am experiencing non-deterministic failures in a program that I have made that uses tokens with custom allocators. It runs out of buffers to allocate. My guess is that there are some other data races in the token logic. The token API is invoked from different contexts, sometimes the environment mutex is held, sometimes it is not.

erlingrj · 2023-11-26T18:04:49Z

Update: I found the culprit. When marking a token input as mutable, lf_writable_copy is code-generated into the reaction bodies, i.e. outside any critical section. There are data races within this function. I solved it by grabbing the environment mutex of the port this is invoked on. There are other ways also, e.g. we could code-generate it. But I feel that the code-generation is more brittle.

I think we should take a good look at the token system and identify where there are possible race conditions. There might be more

lhstrh · 2024-02-04T06:30:08Z

@edwardalee and @erlingrj, what should happen with this PR?

erlingrj · 2024-02-04T16:48:02Z

This PR does two things:

Avoid some data races for decrementing and incrementing the token_allocation_count. It also hides it behind the NDEBUG macro so it would not be part of a program compiled for RELEASE

Should we use atomic fetch_and_add instead?

Avoid some data race when lf_writable_copy is called. This only appears with mutable inputs with token type,
I don't remember exactly what happened inside lf_writable_copy that was not thread-safe.

edwardalee

This looks good to me. I've pushed some changes that make use of macros to check return values more uniform and complete and to remove duplication of checking code.

One this that is not clear to me is how a user should set the NDEBUG flag. How to do this @erlingrj ?

erlingrj · 2024-02-04T19:22:51Z

It is a convention/standard that NDEBUG (not debug) will be defined by the compilators in release builds. This is how normal asserts are disabled also. When the user specifies build-type: Release CMake will generate this compile definition. This should work across different compilers (clang/gnu/msvc)

Fix data races for _lf_count_payload_alloactions

9f05b53

erlingrj requested a review from edwardalee November 25, 2023 15:42

edwardalee approved these changes Nov 25, 2023

View reviewed changes

Only count payload and token allocations when !defined NDEBUG

1cced7b

erlingrj requested a review from edwardalee November 26, 2023 11:55

Do lf_writable_copy in a critical section to avoid races

ff7355b

edwardalee changed the title ~~Fix data races for _lf_count_payload_alloactions and _lf_count_token_allocations~~ Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations Feb 4, 2024

Resolve conflicts

f5f03f8

More complete use of macros to check return values

32d3d91

edwardalee approved these changes Feb 4, 2024

View reviewed changes

edwardalee merged commit d96cbce into main Feb 4, 2024
28 checks passed

edwardalee deleted the count-token-allocations-race branch February 4, 2024 22:22

lhstrh added the bugfix label Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

erlingrj commented Nov 25, 2023

edwardalee left a comment

erlingrj commented Nov 25, 2023 •

edited

Loading

erlingrj commented Nov 26, 2023

erlingrj commented Nov 26, 2023

erlingrj commented Nov 26, 2023 •

edited

Loading

lhstrh commented Feb 4, 2024

erlingrj commented Feb 4, 2024

edwardalee left a comment

erlingrj commented Feb 4, 2024 •

edited

Loading

Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

Conversation

erlingrj commented Nov 25, 2023

edwardalee left a comment

Choose a reason for hiding this comment

erlingrj commented Nov 25, 2023 • edited Loading

erlingrj commented Nov 26, 2023

erlingrj commented Nov 26, 2023

erlingrj commented Nov 26, 2023 • edited Loading

lhstrh commented Feb 4, 2024

erlingrj commented Feb 4, 2024

edwardalee left a comment

Choose a reason for hiding this comment

erlingrj commented Feb 4, 2024 • edited Loading

erlingrj commented Nov 25, 2023 •

edited

Loading

erlingrj commented Nov 26, 2023 •

edited

Loading

erlingrj commented Feb 4, 2024 •

edited

Loading