-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Prevent unbounded lock holds in BufferManager of EventPipe #48435
Prevent unbounded lock holds in BufferManager of EventPipe #48435
Conversation
In your description of issues resolved I think you mean dotnet/diagnostics#1412 instead of #1412 ? |
The remaining test failures appear unrelated. |
@davmason - do you have a bit of time to look more closely at this? @josalem - in my cursory look this seemed fine but I'm definitely hitting a learning curve now that this code has been converted to C and I can't easily find the which macros are generating some of the code elements. I'm going to need to come back and run this through the preprocessor to better understand how the code is working post-C-conversion. None of that is specific to your change, it just slows me down from being able to evaluate it. |
ep_rt_thread_session_state_array_iterator_next(&thread_session_state_array_iterator)) { | ||
|
||
EventPipeThreadSessionState * session_state = ep_rt_thread_session_state_array_iterator_value(&thread_session_state_array_iterator); | ||
ep_rt_thread_session_state_list_remove(&buffer_manager->thread_session_state_list, session_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be worth adding support to remove an item in list based on iterator and return a new iterator as result (similar to std::list::erase, then there wouldn't be a need for this second loop while still holding lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to go take a look at the implementation of both Mono's list and Coreclr's SList. I agree though. I think in the long run we should implement a set of container classes that are shared between the two runtimes or at least used in shared native code like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For list I believe it could be done since iterator just needs to move to a position in the list that continue to work after the item as been removed. I believe we should be able to implement it on both list implementation and be able to drop this second loop.
@noahfalk I am starting to take a look but so far have the same issue as you. It will be slower than normal |
@josalem @lateralusX @davmason - A sidebar, but given the general learning curve to understand the structure/conventions of the code post-C-conversion it may make sense for us to start a little README.md with useful notes? If anything like that already exists let me know but if not sometimes it can be useful to have newcomers (like myself) write it as they learn. |
Some smaller description in initial PR's around high level structure of the library, #34600, there is also more info in the talk/ppt held around EventPipe C port in .net platform talks in November. Writing up a file to collect information sounds like a good idea, feel free to ping me as well with any questions you might have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had same issue with Noah and David so it took a while - coming together nicely!
* make the sequence point maps removable * clear session state from sequence points before deletion
* also do some code formatting
ccce44b
to
26e34ec
Compare
remaining test failure appears to be a known issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is good 👍
The learning curve for conventions coming into this code felt pretty steep but I captured a few notes that helped me and might help others: #49542
Given that I don't spend as much time in this code as others I don't want to make undue pressure to adjust conventions, but even after I knew what the patterns were it still felt as though I was spending a lot of mental energy to grok it. I didn't get the impression I was alone so it may be worth thinking about potential adjustments for easier readability?
Any suggestions are highly welcomed (since I'm know this code and architecture inside out, I'm sort of adjusted to it), would be great to get a list of things that are the top "pain points" and I'm certain we can improve going forward. Parts of the library have some complexity due to taking C++ concepts into the C world and still try to keep type safety of underlying runtime implementation, like how the definitions of the the container implementations making it possible to hook up C++ artefact's in runtime implementations and when building library using C++ compiler (like CoreClr) get type safety checks by compiler in container classes used through the C library. Library also implements getter/setters and virtual inheritance, that could complicate things and give long statements and sometime hard to track function calls (due to virtual inheritance, primarily in the serializer parts of the library). I tried really hard to setup a convention and follow it thoroughly in the implementation to simplify readability as much as possible, but since another ambition was to be close to C++ naming (making it easier to transition and maintain both code bases during transition time), names tended to be long and I believe that together with the getters sometimes can produce long statements and together with the compactness of C code, could make it harder to eyeball. |
I added a few thoughts over at #49542 (comment). I think a lot of it aligns with things you were already considering. I agree with your goal of making the original translation very mechanical and now that we've completed the port and know everything is working well we probably have more safety margin to make refactoring adjustments. Thanks! |
If there are no opinions otherwise, I will merge this later today. |
fix #43985
This should help, but not solve dotnet/diagnostics#1412
These changes will limit the time the BufferManager lock is held to O(N) where N is the number of currently active threads. This does not alleviate the lock caravan behavior we see when N becomes sufficiently large.
Belongs to #45518
Every physical thread in a process has an EventPipeThread created the first time an event is written on that thread. These EventPipeThread objects are thread-local, ref-counted instances that live longer than the physical thread. Each of these objects has a fixed-size (64, the max number of EventPipe sessions) array of ThreadSessionStates that keep hold references to the buffers for a given thread. In addition, for every EventPipe session there is one BufferManager that manages the ThreadSessionStates on the EventPipeThreads. It holds a lock that protects access to the buffers and sequence point data for a given session. Each of the ThreadSessionStates has a strong reference to the EventPipeThread it belongs to, keeping it alive even if the physical thread dies.
This results in a relation that looks like this:
This means that the list of ThreadSessionStates is unbounded in length over the length of a session. For a sufficiently long session in multithreaded process (or a pathologic case such as my stress tests), this means that the list can become problematically long. For example, a process that creates a thread, then joins it in a loop will create problems for single, long-lived session.
The problem arises from the spin lock on the BufferManager. This lock can be taken in a few different situations (non-exhaustive list):
The selection of scenarios up above have work time of O(N) where N is the number of ThreadSessionStates.
This means that lock holding times scale with integral of all threads ever created during a session.
To remedy this, I've added logic that culls ThreadSessionStates belonging to EventPipeThreads that have been orphaned and that have had all their events flushed. In short, these ThreadSessionStates won't have any new events on them again. The logic guarantees that at least one SequencePoint will have the last known sequence value for events on that thread.
I tested this by writing a simple stress application. In pseudo-code it did the following:
This will create thousands of threads per second, but only one thread should ever be writing events at a time. This will minimize impacts from having many threads contending for the lock at the same time (a separate issue) while still exemplifying the underlying problem.
To measure the impact of these changes, I added some simple code during development that calculated histograms of how long the lock was held and how long the lock was waited on. With that I collected the following basic datapoints.
Baseline (master + histogram code):
Patch:
One observation to make with the above data is that the 12.4MM number may be erroneous. It is possible this issue also affects the lost events count in the SequencePoints or the one that EventPipeEventSource calculates. The actual collected events for the baseline is ~240K events which does match with expectations (~81K threads * 3 events per thread). I have not investigated this issue further while working on this patch, but it is worth a second investigation to make sure those numbers are accurate.
CC @tommcdon @sywhang @noahfalk @lateralusX @shirhatti