[llvm-dev] call_once and TSan

Fri Sep 2 04:42:01 PDT 2016

INTERCEPTOR(call_once, o) {
    __tsan_release_merge(o);
   REAL(call_once)(o);
  __tsan_acquire(o);
}

?


On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>
>> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote:
>>
>> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>>>
>>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm trying to write a TSan interceptor for the C++11 call_once function.  There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx.
>>>>>
>>>>> TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic.  However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling.
>>>>>
>>>>> Any ideas how to handle call_once in TSan?
>>>>
>>>> Does anybody remember exact reasons we disable exceptions in sanitizer
>>>> runtimes? One is that it won't link with C programs. Are there any
>>>> other?
>>>> If C is the only reason: there is already a part of tsan runtime is
>>>> linked only to C++ programs (it contains operator new/delete
>>>> interceptors). We could add additional files to the cxx part of
>>>> runtime and build them with exceptions.
>>>>
>>>> Alternatively, the interceptor can handle only synchronization but
>>>> forward actual logic to the real function. Along the lines of:
>>>>
>>>> INTERCEPTOR(call_once, o) {
>>>> __tsan_acquire_release(o);
>>>> REAL(call_once)(o);
>>>> }
>>>>
>>>> That will have some performance impact. If we hardcode the "fully
>>>> initialized" value, then we can eliminate the additional overhead:
>>>>
>>>> INTERCEPTOR(call_once, o) {
>>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
>>>>    __tsan_acquire(o);
>>>>    return;
>>>> }
>>>> __tsan_acquire_release(o);
>>>> REAL(call_once)(o);
>>>> }
>>>
>>> Unfortunately, the first fast-path check is inlined and cannot be intercepted.  We can only intercept the inner call to __call_once.  But how would __tsan_acquire_release help here?  The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed.  Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with.
>>
>>
>>
>> Will then the following work?
>>
>> INTERCEPTOR(call_once, o) {
>>  REAL(call_once)(o);
>>  __tsan_release_merge(o);
>>  __tsan_acquire(o);
>> }
>
> Still racy.  Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0.  Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published.
>
> Kuba
>