[llvm-dev] TSAN hack on AArch64 for Android

Mon Aug 31 15:23:29 PDT 2015

 > -----Original Message-----
 > From: Renato Golin [mailto:renato.golin at linaro.org]
> 
> > (*) Its pretty darn near impossible to  have the perfect patch, which
 > > is [a] small [b] doesn’t break anything [c] adds major new features.
 > > (*) What I have currently is: [a] not very big [b] doesn’t break
 > > anything VISIBLY [c] adds major new features
 > 
> 
 > You're disabling the tests on Android/TSAN/AArch64, but you're touching
 > other arches.

 [ jasonk --> ]  There seems to be a bit of unwarranted FUD here. 
My work is ONLY for TSAN on aarch64-android. Where are you seeing me "touching" other architectures? The new bits for the __sanitizer_XXX structs are directly from bionic .h headers, which *IS* platform-neutral. 

Can you please poin to a specific point in the patch that you *know* will negatively impact other architectures? Please be specific. I am unware of any such bits in the patch, and if it really *does* impact other platforms, then it is merely mistake on my part that was (correctly!) caught at code review time, and thus will be corrected quickly.

 > 
 > Linaro has added *a lot* of sanitizers to AArch64. The way we did was to
 > have all changes local until *all* the tests pass. If *one or two* tests were
 > flaky, we marked them as "REQUIRES: stable-runtime" and investigated it.

[ jasonk --> ] Moot point. I think. I am not suggesting anything substantively different than how ASAN/AARCH64 was integrated into compiler-rt. Please read below.

> Have you tested your changes in x86_64 Android/Linux? AArch64 Linux?
 > What other arches would it touch? Mips64?

[ jasonk --> ] Uhh, my work is ONLY for android-aarch64. Others can jump in and improve for other android platforms (non-aarch64) as they see fit. Where are you getting all these other architectures from???? I am ONLY touching TSAN on android AND aarch64. Nothing else. Its not even turned on unless a specific CMAKE flag is passed during configure time.

 > If your changes are *that* unstable in Android, I'm very worried that it will
 > destabilise the whole AArch64 implementation, which we just pushed in
 > and it is working.

[ jasonk --> ]  In the sense, that the tests were already not 100% stable, as you say, all I've done is, for android-aarch64 (which, according to lab.llvm.org is not even a tested platform yet), made sure of the following:

0. My changes should have NO impact on other architectures supported by compiler-rt
1. tsan on android-aarch64 is still disabled by default, activated only if a specific cmake flag is present when compiler-rt is being configured.
2. The tests that do not behave as expected on android-aarch64 are turned off, ONLY for android-aarch64. They are run, as is, for other platforms.
3. This will give some amount of breathing room for me to go in and triage the tests, one by one, while I have some amount of reassurance that some one else does not come along and horribly break the work that I've already done.
4. Not yet mentioned, but is critical, is that my TSAN patch has to be tested in conjunction with an android testing environment, which means a new buildbot, and possibly a new VM, running on labs.llvm.org, specifically for this work.
5. Not to belabor the point, but right now, even if my sumo patch (D11532) landed on compiler-rt RIGHT NOW, I would bet that it would not disturb the existing compiler-rt build at all. Things that were already broken, would likely still remain broken, and things that were passing, probably would not be affected at all.

 > 
 > One good example is ASAN/AArch64. When Christophe implemented it last
 > year, it took us 2 months to commit it to the tree, but we left the build and
 > *all* tests disabled because it passed 100% on our boxes, but not on the
 > public buildbot. It was only recently, many months later, that we enabled
 > the build/tests. In your case, it doesn't even pass on your box.

[ jasonk --> ] Ahh yes.  Moot point again. Tests don't pass SOMEWHERE. Whether they are, as you say, *all* disabled on the public bot, or (as in my case, a portion is marked unsupported, ONLY on aarch64-android, but is run as-is for other existing configurations) is really a minor point.
If you were to have done as you are requesting me to do, then shouldn't the ASAN/AARCH64 work been delayed being commited to compiler-rt until ALL the tests passed 100%??

 > 
 > 
> A big feature is hard to implement. You have to carry along your branch, re-
 > base, adapt, and you have to add tests, and test on other architectures and
 > OSs to make sure everything is good. For that big a change, you cannot rely
 > on the buildbots to tell you that, especially not on the architecture you're
 > targeting, and the base architecture, which is x86_64.

[ jasonk --> ]  I am not disagreeing here. All I am asking is for a little wiggle room for me to triage the tests without having to juggle 3 pins hopping one legged during a hurricane purched on top of a chandelier, while doing the occasional ankle-to-hand beer toast (to coin a phrase). 

My tsan-on-android-aarch64 taskforce consists of exactly one person, me. Because the work touches both BIONIC AND compiler-rt, it is more complex task than X-sanitizer-on-linux type of patches. 
For example, I will need a new build bot, new VM running ANDROID to even test the work on lab.llvm.org. Without these, my D11532 patch, even if it landed RIGHT now, should be invisible to the rest of the tested configurations.

Rebasing my android patches is significantly more involved, both in time and effort than purely llvm-clang-compiler-rt rebase.
I am simply trying to reduce the circular tail chasing to a manageable level.

I will ensure that my work does not impact other supported architectures negatively, but you (collective YOU, as in the compiler-rt community) please have a bit of forbearance, and *do not believe (unwarranted!) FUD*. The intention and I think also the work --- even in its current rough form --- should not impact other tested compiler-rt configurations at all. 
As I stated earlier, and blips to the contrary is something I am commited to fixing as quickly as humanly possible.

 > 
 > If the tests are unstable even on x86_64 (I know they are, more unstable
 > than on ARM/AArch64), you have to be extra *extra* careful.
 > We cannot simply disable everything.

[ jasonk --> ]  Please please please! get this straight! I am not "disabling everything." My work requires a new CMAKE flag to even be active. It should not impact other platforms at all. 

Thanks for reading!
-Jason