[llvm-dev] TSAN hack on AArch64 for Android

Fri Aug 28 13:01:03 PDT 2015

On 28 August 2015 at 18:02, Jason Kim <jasonk at codeaurora.org> wrote:
> IMO having to disable 2/3 of the tests means the patch isn't ready yet.

Dan is absolutely right.

> (*) The tests weren’t all stable to begin with.

That's not a reason to break them more. If you can't prove they're
correct, you can't prove you broken them more, especially on other
arches.

> (*) I am perfectly willing to triage the failures, but please keep in mind
> that any “threshold” that we cross from NOT testing TSAN on aarch64+android
> to actually testing on it is going to be a large hurdle.

I'm glad you are, but please do it off trunk.

Your changes touch other OSs (ex. AArch64 Linux) and also other Arches
(see some comments to that effect on the patch). You can't just push
something that isn't ready like that.

> (*) BIONIC is NOT glibc!

I missed the point here...

> (*) Its pretty darn near impossible to  have the perfect patch, which is [a]
> small [b] doesn’t break anything [c] adds major new features.
> (*) What I have currently is: [a] not very big [b] doesn’t break anything
> VISIBLY [c] adds major new features

It's hard, I give you that. But that's why most of us chose compilers
to work with. It's also a lot of fun.

I've seen lots of people provide lots of ideas on how to do that
slowly, and re-use changes in Android, RT, etc.

> (*) The new feature (TSAN on aarch64-linux-androideabi) is currently turned
> OFF by default

You're disabling the tests on Android/TSAN/AArch64, but you're
touching other arches.

Linaro has added *a lot* of sanitizers to AArch64. The way we did was
to have all changes local until *all* the tests pass. If *one or two*
tests were flaky, we marked them as "REQUIRES: stable-runtime" and
investigated it.

Never 2/3 of the tests. Never when touching other arches / OSs.

Have you tested your changes in x86_64 Android/Linux? AArch64 Linux?
What other arches would it touch? Mips64?

If your changes are *that* unstable in Android, I'm very worried that
it will destabilise the whole AArch64 implementation, which we just
pushed in and it is working.

One good example is ASAN/AArch64. When Christophe implemented it last
year, it took us 2 months to commit it to the tree, but we left the
build and *all* tests disabled because it passed 100% on our boxes,
but not on the public buildbot. It was only recently, many months
later, that we enabled the build/tests. In your case, it doesn't even
pass on your box.

> (*) There are plenty of cases in prior history in llvm where a set of tests
> were turned off temporarily, especially when a major new platform is being
> introduced. I don’t think its unreasonable to have that as a starting point.

A major new platform, ie. a new back-end, doesn't normally break
anything elsewhere. And when it touches generic parts, tests are
carried out to prove that it didn't break anything.

If your code was 100% new files/functions/macros on a new architecture
that no one uses, it'd be ok. But that's not the case.

A big feature is hard to implement. You have to carry along your
branch, re-base, adapt, and you have to add tests, and test on other
architectures and OSs to make sure everything is good. For that big a
change, you cannot rely on the buildbots to tell you that, especially
not on the architecture you're targeting, and the base architecture,
which is x86_64.

If the tests are unstable even on x86_64 (I know they are, more
unstable than on ARM/AArch64), you have to be extra *extra* careful.
We cannot simply disable everything.

cheers,
--renato