[llvm-dev] call_once and TSan
Kuba Brecka via llvm-dev
llvm-dev at lists.llvm.org
Fri Sep 2 05:21:51 PDT 2016
Same problem exists, thread A can still be within REAL(call_once), but after it ran user code and set the flag to ~0. Roughly, call_once does:
__call_once(flag, arg, func) {
mutex_lock(mut);
if (flag == BEING_INITIALIZED) { wait }
else if (flag == NOT_INITIALIZED_AT_ALL) {
flag = BEING_INITIALIZED;
mutex_unlock(mut);
func(arg); // <=== user code callback
mutex_lock(mut);
atomic_store(&flag, FULLY_INITIALIZED, mo_release); // "release" store, but within a compiled dylib, thus invisible to TSan
}
mutex_unlock(mut);
}
If thread A is just after the release store, which is invisible to TSan, __tsan_acquire in thread B will have no effect, and the stores from the callback to func(arg) will not be synchronized to thread B.
Anyway, I just realized that we can wrap "func" into our own callback, which will perform the (extra) __tsan_release... Do you think that would work? E.g.:
void call_once_callback_wrapper(...) {
orig_func(orig_arg);
__tsan_release(flag);
}
INTERCEPTOR(call_once, o, func, arg) {
REAL(call_once)(flag, ..., call_once_callback_wrapper);
}
Kuba
> On 2 Sep 2016, at 13:42, Dmitry Vyukov <dvyukov at google.com> wrote:
>
> INTERCEPTOR(call_once, o) {
> __tsan_release_merge(o);
> REAL(call_once)(o);
> __tsan_acquire(o);
> }
>
> ?
>
>
> On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>>
>>> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote:
>>>
>>> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>>>>
>>>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to write a TSan interceptor for the C++11 call_once function. There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx.
>>>>>>
>>>>>> TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic. However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling.
>>>>>>
>>>>>> Any ideas how to handle call_once in TSan?
>>>>>
>>>>> Does anybody remember exact reasons we disable exceptions in sanitizer
>>>>> runtimes? One is that it won't link with C programs. Are there any
>>>>> other?
>>>>> If C is the only reason: there is already a part of tsan runtime is
>>>>> linked only to C++ programs (it contains operator new/delete
>>>>> interceptors). We could add additional files to the cxx part of
>>>>> runtime and build them with exceptions.
>>>>>
>>>>> Alternatively, the interceptor can handle only synchronization but
>>>>> forward actual logic to the real function. Along the lines of:
>>>>>
>>>>> INTERCEPTOR(call_once, o) {
>>>>> __tsan_acquire_release(o);
>>>>> REAL(call_once)(o);
>>>>> }
>>>>>
>>>>> That will have some performance impact. If we hardcode the "fully
>>>>> initialized" value, then we can eliminate the additional overhead:
>>>>>
>>>>> INTERCEPTOR(call_once, o) {
>>>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
>>>>> __tsan_acquire(o);
>>>>> return;
>>>>> }
>>>>> __tsan_acquire_release(o);
>>>>> REAL(call_once)(o);
>>>>> }
>>>>
>>>> Unfortunately, the first fast-path check is inlined and cannot be intercepted. We can only intercept the inner call to __call_once. But how would __tsan_acquire_release help here? The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed. Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with.
>>>
>>>
>>>
>>> Will then the following work?
>>>
>>> INTERCEPTOR(call_once, o) {
>>> REAL(call_once)(o);
>>> __tsan_release_merge(o);
>>> __tsan_acquire(o);
>>> }
>>
>> Still racy. Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0. Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published.
>>
>> Kuba
>>
More information about the llvm-dev
mailing list