[llvm-dev] TSAN hack on AArch64 for Android

Wed Aug 19 03:29:07 PDT 2015

Wait, this change is not submitted yet, right? Or you mean mailing of
this change in bad shape?

I consider this change as work-in-progress where author is looking for
feedback on his ongoing progress. I guess the change description
should have been spelled this very explicitly.
This change definitely needs more work to be submitted, splitting into
smaller patches with a plan for submission order, refactoring,
cleanup, etc. That is no question.
As for the design, there is not much to design upfront, as you
discover most of the issue only when you hit them (I would not foresee
them). Also, most of the changes in the patch are just spot fixes
(e.g. if a struct definition is different on the new platform, you
just need to provide a correct definition), there is nothing to design
there.

Regarding the tls, whether we call it "emulation" or not, I believe we
can do it a very neat and laconic form.

Refactoring of tsan per-thread structures in preparation for the
"emulation" should go in independently. Agree.
Aarch64 support is already submitted to tsan as far as I see.

On Wed, Aug 19, 2015 at 1:11 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 18 August 2015 at 22:32, Dmitry Vyukov <dvyukov at google.com> wrote:
>> So the problem is only with the tls emulation, right?
>
> No. It's with the whole patch, including its contents, how it was
> submitted, what is its purpose and how nobody cared about any of this.
>
> Adding TLS emulation is, in its own, a worthy feature. It allows for
> platforms that can't do full TLS to use TSAN, which is a great tool.
>
> But when you want to add a feature to a tool that runs on multiple
> architectures, you plan the feature, you design it, you send an RFC
> email, you discuss in the list.
>
> Then you go about the implementation, submit some preparation patches,
> clean up code, and have some more discussions. For a good example, see
> Daniel Sanders' "The trouble with triples" and the associated patches.
> It takes a lot of time to conjure a new feature in a complex piece of
> software, but more than that, needs consensus and people designing it
> beforehand.
>
> But when someone dumps a patch that is nowhere near production
> quality, or would even compile in most platforms, people should be
> backing up and saying: "Wait a minute. What are you trying to do, and
> how did you get here?". We had a similar issue a few months back with
> the new Stride Vectorizer. Fantastic feature, one that I had worked
> before, but it took us a whole month of "code review" to figure out
> that the whole patch was just wrong, and had to be completely
> re-written, from where we had stopped two years back.
>
> I'm doing the same now. Let's make sure to agree on what's being
> proposed, THEN we'll agree if that's worthy or not, THEN we'll agree
> what's the best implementation, THEN we'll review code. We can't start
> from the last iteration, as that'll invariably lead to poor code going
> in.
>
> Furthermore, you don't get TLS+TSAN+AArch64+Android support in a
> single patch! The likelihood that we'll end up reverting it a hundred
> times is really high, and that will completely unbalance our buildbot
> infrastructure and break everyone else's workflow. Slow is faster than
> ludicrous. One step at a time, please.
>
> First, let's get TSAN working on AArch64. Adhemerval is working on
> that. Then, let's see what's missing for Android. Ok, TLS emulation is
> required, at least for the time being. What other architecture would
> benefit from TLS emulation? None? Ok, then we set out to change the
> code, so that TLS emulation can go in. Meanwhile, a group of people
> discuss (in public) the design of such framework, and in time, a
> number of patches will start to go in, one by one, until the feature
> is on. Only then we'll get Android using it.
>
> The time will also give buildbots enough iterations to pick any bug
> that we don't find on code review. Bugs that would be obscured by a
> giant patch with many other more obvious bugs exploding our CI
> infrastructure and getting reverted by angry bot maintainers.
>
>
>
>> I don't know what it worth to add __thread support to android, and
>> whether somebody is going to address it soon.
>
> Wouldn't it be better to know that before submitting a large change to
> another code base? If there is evidence that this is never going to
> happen, or that, for some very strong engineering reasons, it's not
> possible to do it, then the reasons to do TLS emulation are thoroughly
> justified. Right now, it's just a hunch.
>
>
>> But I think we can make the tls emulation _much_ cleaner. First, we
>> need to put all tsan per-thread data into ThreadState object, accesses
>> to ThreadState are already wrapped into cur_thread() function. So now
>> we have a single function to modify, no macros spread across the
>> codebase, no new files, etc.
>
> I'm no expert in that field, but this sounds like a good plan. Doesn't
> seem to be part of this patch, though. Nor it should, as it actually
> should have been a preparatory patch, submitted after all interested
> parties agreed that TLS emulation is the way to go.
>
>
>> INLINE ThreadState *cur_thread() {
>> #ifdef REAL_TLS
>>   return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
>> #else
>>   return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
>> #endif
>> }
>>
>> Which is roughly the same code-wise, but faster and safer.
>
> Dan "Eastwood" Albert has pulled the trigger. :)
>
> This looks reasonable to me. Does this still mean you need the
> emulation, or is Android going to give you a "reasonably looking
> ThreadState"?
>
> cheers,
> --renato