[llvm-dev] TSAN hack on AArch64 for Android

Tue Aug 18 14:32:43 PDT 2015

On Tue, Aug 18, 2015 at 10:53 PM, Renato Golin <renato.golin at linaro.org> wrote:
> On 18 August 2015 at 19:57, Dmitry Vyukov <dvyukov at google.com> wrote:
>> Hi Renato,
>>
>> What exactly hack/hacks do you mean?
>
> Hi Dmitry,
>
> The sanitizer code seems to be more accepting to adding complicated
> pre-processor logic, as well as a number of alternative functions to
> work around various inefficiencies in other systems, but in other LLVM
> projects, such as Clang and the back-ends, we tend to be less
> accepting of work-arounds being implemented just for the same of a
> test.
>
> This case adds an immense complexity (see the size of the patch), and
> it has an immense number of odd assumptions (see the amount of
> negative feedback on the patch), that may never have been tested in
> other architectures (see some comments about generalising the wrong
> stuff). All of that, to be able to "experiment" with TSAN on Android
> in AArch64.
>
> See, if TSAN was completely broken in AArch64, or if other people
> weren't trying to push it upstream, I'd understand someone throwing a
> bunch of hacked files that "works on their boxes", if that's disabled
> by default and is only supposed to work when invoked. But none of that
> is true.
>
> TSAN works to an extent in AArch64, and we have been pushing some
> patches to prove that. With ASAN live on the buildbots, and the
> builtins working, we'll now focus on TSAN to get it being tested in
> the buildbots, too. That's Linux AArch64, of course, as we don't test
> Android at all.
>
> Now, the oddities with Android is that bionic and the rest of the
> run-time is crippled in ways that no other system I know gets close.
> Layers upon layers of libraries, aliases, asm hackery, just to make it
> build. Adding Clang to the list of compilers did not make it better.
>
> This comment baffles me: "Ideally, this should be supported with the
> __thread keyword, but that is not yet supported, and it is a much
> bigger chunk of work to get it integrated into the default android
> cross compiler."
>
> AFAIK, __thread is supported by GCC and Clang, and this seems to be a
> problem in Android's linker. So, to work around a problem in Android's
> linker, we're adding a huge pile of code to LLVM for a temporary fix
> because of a possible experimentation in Android?
>
> The amount of work being done, and still required, to get this patch
> upstream in any decent shape or form will take many engineer-months,
> probably the same as it would to actually fix the linker in the first
> place. Especially if you add up the work that will be required to
> *remove* all that after the linker is fixed.
>
> But I'm being over-optimistic here. I'm old enough to know that,
> whenever someone introduces a "work-around that works", the
> probability of anyone fixing the original problem is zero. And the
> Android code that I had the displeasure to see, so far, has shown me
> that this is ultimately true on their side. And that's why I'm
> baffled, because we're moving that mentality, that work arounds are
> ok, no matter how ugly they are, into LLVM, and everyone seems to be
> ok with it.
>
> If everyone is ok with it, I'll just shut up. But I'd like to make
> sure that everyone is ok to do so.
>
> My tuppence.
>
> --renato

Hi Renato,

So the problem is only with the tls emulation, right?

I don't know what it worth to add __thread support to android, and
whether somebody is going to address it soon.
But I think we can make the tls emulation _much_ cleaner. First, we
need to put all tsan per-thread data into ThreadState object, accesses
to ThreadState are already wrapped into cur_thread() function. So now
we have a single function to modify, no macros spread across the
codebase, no new files, etc. Then, I've looked at pthread_getspecific
code in bionic and it is actually quite fast and does not call any
other system function (except get_thread() which is basically an
access to real tls). So we can use it, and then the whole support
becomes merely:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(pthread_getspecific(thread_key));
#endif
}

Which looks quite acceptable to me.

Also we could try to ask android to dedicate a real tls slot for
sanitizers (I don't know what are the chances, but ART obviously get a
one). Then we can change the support to:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
#endif
}

Which is roughly the same code-wise, but faster and safer.

Am I missing something?