-
Notifications
You must be signed in to change notification settings - Fork 1k
ThreadSanitizerAboutRaces
Most programmers know that races are harmful.
For some races it could be quite easy to predict what may go wrong.
For example, the variable var
in the following code may not be equal to 2
at the end of the program execution (for more examples refer to ThreadSanitizerPopularDataRaces)
int var;
void Thread1() { // Runs in one thread.
var++;
}
void Thread2() { // Runs in another thread.
var++;
}
However, there are other, much more subtle races which are much less obviously harmful. This article is about such races. Most of the content is specific to C/C++.
TODO(timurrrr) Reference the C++11 standard mentioning data races lead to UB?
TODO(kcc) This article is not finished. Expect more content later.
struct Foo {
int a;
Foo() { a = 42; }
};
static Foo *foo = NULL;
void Thread1() { // Create foo.
foo = new Foo();
}
void Thread2() { // Consume foo.
if (foo) {
assert(foo->a == 42);
}
}
foo
.But hey, why this race is harmful? Follow my fingers:
- Thread1 calls
new Foo()
, which allocates memory and callsFoo::Foo()
- Thread1 assigns the new value to
foo
andfoo
becomes non-NULL.
- Thread2 checks if
foo != NULL
and only then readsfoo->a
.
The code is safe!
foo != NULL
does not guarantee that the contents of *foo
are ready for reading in Thread2.For example, the following code does not work on machines with weak memory model.
int A = 0, B = 0;
void Thread1() {
A = 1;
B = 1;
}
void Thread2() {
if (B == 1) {
assert(A == 1);
}
}
If you are writing in assembly, yes, this is safe.
But with C/C++ you can not rely on the machine's strong memory model because the compiler may (and will!) rearrange the code for you.
For example, it may easily replace
A=1;B=1;
with B=1;A=1;
.Ok, I believe that compilers may do simple reordering, but that doesn't apply to my code!
For example,
Foo::Foo()
is defined in one .cc
file and foo = new Foo()
resides in another.And even if these were in a single file, what kind of reordering could harm me?
First, it's a bad idea to rely on the compiler's inability to do some legal transformation.
Second, the modern compilers are much more sophisticated than most programmers think.
For example, many compilers can do cross-file inlining or even inline virtual functions (!).
So, in the example above a compiler may (and often will) change the code to
foo = (Foo*)malloc(sizeof(Foo));
new (foo) Foo();
foo
.But obviously, the consumer thread's code is safe. What can possibly go wrong with this code on x86?
void Thread2() { // Consume foo.
if (foo) {
// foo is properly published by Thread1().
assert(foo->a == 42);
}
}
void Thread2() { // Consume foo.
Foo *t1 = foo; // reads NULL
Foo *t2 = foo; // reads the new value
if (t2) {
assert(t1->a == 42);
}
}
One more frequent bug:
// lazy init for Pi.
float *GetPi() {
static float pi;
static bool have_pi = false;
if (!have_pi) {
pi = ComputePi(); // atomic assignment, no cache coherency issues.
have_pi = true;
}
return pi;
}
Even experienced programmers may assume that this code is correct on a CPU with strong memory model.
But remember that a compiler may rearrange have_pi = true;
and pi = ComputePi();
and one of the threads will return uninitialized value of pi
.
volatile
for synchronization in C/C++.The C and C++ standards don't say anything about
volatile
and threads -- so don't assume anything.Also, the C++11, chapter intro.multithread, paragraph 21 says:
"The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior."
You don't believe us? Ok, here is a simplest proof that C++ compilers don't treat volatile as synchronization.
% cat volatile.cc
typedef long long T;
volatile T vlt64;
void volatile_store_64(T x) {
vlt64 = x;
}
% g++ --version
g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
...
% g++ -O2 -S volatile.cc -m32 && cat volatile.s
...
_Z17volatile_store_64x:
.LFB0:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl 12(%ebp), %edx
popl %ebp
movl %eax, vlt64
movl %edx, vlt64+4
ret
...
Do you still use C++
volatile
as synchronization?You may object that we are cheating (with 64-bit ints on a 32 arch).
Ok, we indeed cheated, but just a bit. How about this case?
% cat volatile2.cc
volatile bool volatile_bool_done;
double regular_double;
void volatile_example_2() {
regular_double = 1.23;
volatile_bool_done = true;
}
% g++ -O2 volatile2.cc -S
% cat volatile2.s
...
_Z18volatile_example_2v:
.LFB0:
..
movabsq $4608218246714312622, %rax
movb $1, volatile_bool_done(%rip)
movq %rax, regular_double(%rip)
ret
...
Here you can see that the compiler reordered a non-volatile store and a volatile store.
Are you convinced now? If no, please let us know why!
If you got a data race report:
- analyze the code locations and the stack traces of the racing accesses
- understand the object/field that is subject to the race (using the top frame)
- understand the context in which the object is accessed (using the whole stack trace)
Note: Location description (heap allocation stack, global variable name, or thread stack) may help to understand the object that is subject to the race.
Note: If the race happens on a standard container (e.g. std::string
), then you may
see a number of stack frames inside of the standard library. In such cases find the first
frame that calls into the standard library.
Once you understood that, there are 3 main scenarios:
-
The object is supposed to be accessed concurrently, but lacks proper internal synchronization. In this case, add internal synchronization to the object. For example, add
std::mutex
field to the class and lock it inside of the methods. -
The object is not supposed to be accessed concurrently (for example, this is true for all standard containers like
std::list
,std::vector
,std::map
, etc) and the calling code (upper up the stack) incorrectly accesses the object concurrently. In this case, change the calling code to not access the object concurrently. This may involve addingstd::mutex
in the calling code, waiting for completion of something before accessing the objects, queuing some callbacks sequentially, etc. Generally, this is highly-dependent on the concrete system and cannot be answered without understanding how the system is supposed to work and how it's actually working. -
If you believe the report is a false positive and the accesses do not happen concurrently, find the exact synchronization that is supposed to make them non-concurrent and ensure that it's indeed in place and is correct, e.g. placed on the right side (before/after) of the exact racing accesses.
Note: try to avoid using raw atomic primitives and memory fences when fixing data races. These primitives are notoriously hard to use correctly.
- What Every Programmer Should Know About Memory
- What Every Dev Must Know About Multithreaded Apps
- The "Double-Checked Locking is Broken" Declaration
- http://en.wikipedia.org/wiki/Memory_barrier
- http://en.wikipedia.org/wiki/Volatile_variable
- Benign data races: what could possibly go wrong?
- How to miscompile programs with “benign” data races
- Chrome C++ Lock and ConditionVariable