Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations #313

Merged
merged 5 commits into from
Feb 4, 2024

Conversation

erlingrj
Copy link
Collaborator

I noticed large values for number of unfreed messages being reported at termination. I found out that the these global variables are accessed without locking from e.g. reaction bodies. Another valid question is if there are more global state which should be protected in the token handling code.

@erlingrj erlingrj requested a review from edwardalee November 25, 2023 15:42
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, though it's unfortunate to grab a lock just to facilitate a warning that shouldn't be necessary if the code is working properly. Also, I can't find anywhere in the code that _lf_count_token_allocations is incremented. Is it only decremented? These revisions seem to only protect alterations of _lf_count_payload_allocations.

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 25, 2023

You are probably right about the token_allocations, I didn't check that properly, this was addressing the payload allocations yes. I guess it could be hidden behind a NDEBUG flag so it is only compiled if the user specifies a debug build

@erlingrj
Copy link
Collaborator Author

@edwardalee I did the following:

  1. Hide the counting behind #if !defined NDEBUG. This means that it is only compiled when we pick build-type Debug or Test.
  2. I have added the missing increment of _lf_count_token_allocations. I think I added it at the correct place. The count is decremented every time a token is freed (not caring about recycling or not)

@erlingrj erlingrj requested a review from edwardalee November 26, 2023 11:55
@erlingrj
Copy link
Collaborator Author

Hmmm, I believe that there still are issues. I am experiencing non-deterministic failures in a program that I have made that uses tokens with custom allocators. It runs out of buffers to allocate. My guess is that there are some other data races in the token logic. The token API is invoked from different contexts, sometimes the environment mutex is held, sometimes it is not.

@erlingrj
Copy link
Collaborator Author

erlingrj commented Nov 26, 2023

Update: I found the culprit. When marking a token input as mutable, lf_writable_copy is code-generated into the reaction bodies, i.e. outside any critical section. There are data races within this function. I solved it by grabbing the environment mutex of the port this is invoked on. There are other ways also, e.g. we could code-generate it. But I feel that the code-generation is more brittle.

I think we should take a good look at the token system and identify where there are possible race conditions. There might be more

@lhstrh
Copy link
Member

lhstrh commented Feb 4, 2024

@edwardalee and @erlingrj, what should happen with this PR?

@edwardalee edwardalee changed the title Fix data races for _lf_count_payload_alloactions and _lf_count_token_allocations Fix data races for _lf_count_payload_allocations and _lf_count_token_allocations Feb 4, 2024
@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 4, 2024

This PR does two things:

  1. Avoid some data races for decrementing and incrementing the token_allocation_count. It also hides it behind the NDEBUG macro so it would not be part of a program compiled for RELEASE
  • Should we use atomic fetch_and_add instead?
  1. Avoid some data race when lf_writable_copy is called. This only appears with mutable inputs with token type,
    I don't remember exactly what happened inside lf_writable_copy that was not thread-safe.

Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I've pushed some changes that make use of macros to check return values more uniform and complete and to remove duplication of checking code.

One this that is not clear to me is how a user should set the NDEBUG flag. How to do this @erlingrj ?

@erlingrj
Copy link
Collaborator Author

erlingrj commented Feb 4, 2024

It is a convention/standard that NDEBUG (not debug) will be defined by the compilators in release builds. This is how normal asserts are disabled also. When the user specifies build-type: Release CMake will generate this compile definition. This should work across different compilers (clang/gnu/msvc)

@edwardalee edwardalee merged commit d96cbce into main Feb 4, 2024
28 checks passed
@edwardalee edwardalee deleted the count-token-allocations-race branch February 4, 2024 22:22
@lhstrh lhstrh added the bugfix label Apr 27, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants