[llvm-dev] TSAN hack on AArch64 for Android

Tue Sep 1 11:09:02 PDT 2015

Hi Jason,

Let me reduce the defcon level of this thread... Whatever is marked
around <FUD></FUD>, is FUD, and should be taken lightly. Everything
else is a genuine question or opinion.

On 31 August 2015 at 23:23, Jason Kim <jasonk at codeaurora.org> wrote:
> My work is ONLY for TSAN on aarch64-android. Where are you seeing me "touching" other architectures? The new bits for the __sanitizer_XXX structs are directly from bionic .h headers, which *IS* platform-neutral.

There are ifdef ANDROID_SANITIZER blocks that are not guarded by
AArch64. As you say, it is target neutral, but you haven't tested on
other targets. Wouldn't that impact x86_64 Android? Or is that not
supported at all by TSAN?

Others where WORDSIZE was used, but I'm not sure it only applies on your case.

Adhemerval had some comments about MIPS64 in the original patch, but I
think it got lost in the number of other comments.

What I'm getting at is that you think the macro separation you have is
enough to keep the TSAN+AArch64+Android at bay, but unless you test it
on AArch64+Linux, x86_64+Android, you can't be sure of it. You may
just as well be right, and if everybody in the sanitizer arena agree
that you are, I'm fine with it.

> [ jasonk --> ] Moot point. I think. I am not suggesting anything substantively different than how ASAN/AARCH64 was integrated into compiler-rt. Please read below.

I don't follow.

We had all-minus-2 tests passing on three AArch64 environments, out of
4. You have 90% of the tests failing on your own AArch64+Android
environments. As Dan, said, if 90% of your own tests fail, the patch
is not ready yet.

> 0. My changes should have NO impact on other architectures supported by compiler-rt

*should* is the keyword here. But I have already agreed that if san
folks agree with you, I do too.

> 1. tsan on android-aarch64 is still disabled by default, activated only if a specific cmake flag is present when compiler-rt is being configured.

That's good.

> 2. The tests that do not behave as expected on android-aarch64 are turned off, ONLY for android-aarch64. They are run, as is, for other platforms.

As expected. That wasn't one of my concerns.

> 3. This will give some amount of breathing room for me to go in and triage the tests, one by one, while I have some amount of reassurance that some one else does not come along and horribly break the work that I've already done.

<FUD>
So, there is the possibility that the code you're adding, and the
future changes will make it more complicated for others to add/improve
their own code, for instance, AArch64 Linux. I don't know that for
sure, and just wanted to raise concerns that the san experts should be
aware of it.
</FUD>

> 4. Not yet mentioned, but is critical, is that my TSAN patch has to be tested in conjunction with an android testing environment, which means a new buildbot, and possibly a new VM, running on labs.llvm.org, specifically for this work.

That'd be amazing! I was FUDding about not being able to test our own
patches on Android, but having a bot solves that problem.

> 5. Not to belabor the point, but right now, even if my sumo patch (D11532) landed on compiler-rt RIGHT NOW, I would bet that it would not disturb the existing compiler-rt build at all. Things that were already broken, would likely still remain broken, and things that were passing, probably would not be affected at all.

I'm not a gambling man... :)

> If you were to have done as you are requesting me to do, then shouldn't the ASAN/AARCH64 work been delayed being commited to compiler-rt until ALL the tests passed 100%??

No. They were passing all-2, which is about 99%. That's ok. There was
a problem with the buildbot (which is known to be problematic anyway),
so it was disabled by default so not to disturb the community, but we
knew it was correct, as it passed everywhere else.

Now we have more bots, and it passes on all of them, but that old one
is still fiddly. All in all, a clean cut.

You have 180 tests failing. If it was just up to the buildbots, you
disable the tests on Android+AArch64+TSAN and jobs done. But this is,
as Dan said, about the code itself.

It's a decision that we have to make consciously. Adding raw code is
common, but normally that code is well separated from the rest. With
the concerns I had on the macros, I wasn't sure that it was that clean
of a cut.

<FUD>
It is entirely possible that there will be no problem at all, that all
your code is Android only, as you say. I was expecting you'd say that
you tested the code on AArch64+Linux and Android+x86_64, just to make
sure you were right...
</FUD>

> My tsan-on-android-aarch64 taskforce consists of exactly one person, me. Because the work touches both BIONIC AND compiler-rt, it is more complex task than X-sanitizer-on-linux type of patches.

Sigh... I know the drill only too well. :(

> I am simply trying to reduce the circular tail chasing to a manageable level.

Right. Let's get this thing on the road, then.

We don't have any Android environment to test anything from our side,
so any breakage on your side will have to be dealt with by you. I do a
lot of it here, and it's not easy. I hope you're aware of that.

We'll also be submitting patches to improve AArch64 support,
performance or fixing tests. These patches may break your stuff.

One thing I normally do is to download the patches from Phab reviews
and run on my local config, just to make sure that the check-all will
pass on our bot configuration before it actually goes. That helps a
lot the stabilization, and we should work with you to make sure you
can fix them with out help, if needed.

Let's get back to the patches in question, and make sure that we all
agree on the new structures and the TLS implementation. The biggest
part of my FUD, is that we didn't even discuss that beforehand, and
even now, there are at least 5 options to chose from.

Once the structures are accepted by the san experts, and the TLS
implementation is agreed and working, we can get this in, completely
disabled by default.

cheers,
--renato