-
-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Use of std::atomic can slow down multithreaded tests #452
Comments
Here is a reproducer: #include <doctest.h>
#include <thread>
#include <vector>
// 4.364 seconds: std::atomic<int> numAssertsFailedCurrentTest_atomic
// 0.755 seconds: MultiLaneAtomic<int> numAssertsCurrentTest_atomic
TEST_CASE("MultiLaneAtomic") {
static constexpr auto numIters = size_t(10000000);
auto threads = std::vector<std::thread>();
for (size_t i = 0; i < std::thread::hardware_concurrency(); ++i) {
threads.emplace_back([] {
for (size_t it = 0; it < numIters; ++it) {
REQUIRE(it < numIters);
}
});
}
for (auto& thread : threads) {
thread.join();
}
} |
It is my understanding that this solution would work even if there are more threads/cores than the 32 lanes (by default) - in every Regarding the timing - the example code ran on my machine for 4.5 seconds vs 3.2 seconds with this atomic class, so there was some gain indeed. Maybe the 3.5-to-1 difference you are observing is more exaggerated because there are more cores/threads on your machine? I tested with 12 cores - maybe you have a lot more? If that's the case (bigger payoff for higher multi-threadedness) maybe it's worth going forward with a PR - even if everyone has to pay the 10% increase when not using concurrency. It would also be trivial to surround the entire atomic class with an I think the lanes should be a compile-time define so that it's configurable (something like You can put this new atomic class right above |
Exactly, that's the idea. The goal is to spread the threads up between the lanes, and if some threads use the same lane it's not a problem, just a bit of a slowdown between these threads. What compiler are you using? I saw on godbolt that visual studio seems to produce much more code than g++ or clang++ does. So it might not be as good on windows. I have an Intel i7-8700 with 6 cores/12 threads, and used clang++ with |
Whoops - rookie mistake - wasn't specifying Just tested both with clang 9 and gcc 9 on ubuntu 19.10 and now I see the same 3x increase - this does indeed seem worthwhile! |
Adds the configuration option `DOCTEST_CONFIG_NO_MULTI_LANE_ATOMICS` to disable multi lane atomics. This can speed up assertions in highly parallel tests by a factor of 3 and more, with a slight slowdown for the single threaded case. Closes #452
merged the PR - closing this issue as well. Will release a new version probably sometime in January - use the |
Adds the configuration option `DOCTEST_CONFIG_NO_MULTI_LANE_ATOMICS` to disable multi lane atomics. This can speed up assertions in highly parallel tests by a factor of 3 and more, with a slight slowdown for the single threaded case. Closes #452
Description
Having lots of
REQUIRE
in a multithreaded test is relatively slow due to the use ofstd::atomic
.I have a test that performs a lot of asserts, and is called in parallel. It's 240016891
REQUIRE
statements. Without theREQUIRE
, the test takes 0.37 seconds on my machine. When I add the REQUIRE, the test takes 8.05 seconds.It seems that most of the slowdown comes from the use of
std::atomic
for the variablenumAssertsCurrentTest_atomic
. The problem is that in my case 12 threads basically block each other by continuously increasing the atomic, which leads to lots of cache invalidation for the other threads.I've played a bit with the code, and have come up with a multi-lane implementation of atomic. This splits up the atomic into multiple atomics, each sitting on a different cache line, and each thread operates on a different atomic so they can't block each other. This speeds up the test from 8.05 seconds to 2.28 seconds on my machine.
Steps to reproduce
Create a test that spawns many threads, each calling
REQUIRE
in a tight loop.Put this into
doctest.h
:Then, replace the line
std::atomic<int> numAssertsCurrentTest_atomic;
with
MultiLaneAtomic<int> numAssertsCurrentTest_atomic;
The test will run significantly faster with the
MultiLaneAtomic<int>
.If you think this is worthwhile, I could create a pull request for this.
The text was updated successfully, but these errors were encountered: