Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC

Bill Schmidt wschmidt at
Thu Jan 29 13:43:15 PST 2015

On Wed, 2015-01-28 at 11:04 -0800, Chandler Carruth wrote:
> On Wed, Jan 28, 2015 at 10:06 AM, Bill Schmidt
> <wschmidt at> wrote:
>         On Wed, 2015-01-28 at 11:30 -0600, Bill Schmidt wrote:
>         *snip*
>         >
>         > The issue is that we are generating two calls to
>         __get_tls_addr in
>         > different basic blocks.  CSE at the MI level recognizes that
>         the address
>         > computation can be commoned.  So r29 is copied to r3 before
>         both of the
>         > __get_tls_addr calls.
>         >
>         > I think it would probably be somewhat difficult to avoid
>         this commoning
>         > (would have to be specific to these address ops and only
>         when accessing
>         > TLS vars for local/global-dynamic).  And in general it's a
>         good thing to
>         > do.  Alan, does this complicate matters beyond what the
>         linker can
>         > handle?
>         Part of the issue here is the late modeling of the call to
>         __get_tls_addr as a true call.  The MachineCSE pass does not
>         have any
>         logic to common idempotent calls (presumably that is handled
>         at the IR
>         level), so the addis/addi get commoned but the __get_tls_call
>         does not.
>         Originally we modeled __get_tls_addr as a special
>         target-specific opcode
>         that was expanded late.  We had problems with keeping the
>         register
>         copies lined up correctly in the presence of multiple TLS
>         variable
>         accesses, and we fixed this by using the regular call
>         machinery instead.
>         In retrospect, it appears that although this is a much cleaner
>         implementation, it limits the ability to CSE the
>         __get_tls_addr calls.
>         We can work on a fix to go back to the old model and fix the
>         register
>         copies with glue instead, but this would be a pretty
>         substantial change
>         to backport to 3.6.  However, if the linker can't handle the
>         code as is,
>         then this would seem to be the only way forward.
>         Thoughts?
> I think that this is the right direction, definitely. I don't feel
> strongly about backporting it to 3.6 though, we can just put a
> workaround such as the one suggested by Ulrich there (and I'm working
> on that now). The key is to make sure 3.7 handles this well.
Well, I thought I was going to have this done today, but I haven't
actually made anything better yet. :/

I've gone back to the old model and fixed things up with glue, but the
fact that we are exposing the call/return register 3 at this point is
still causing MachineCSE to reject the op representing the
__get_tls_addr call for commoning.

I think what I'll have to do is use only virtual registers in the
lowering, then wait until after MachineCSE to introduce register copies
from/to R3 around the call.  RA should then coalesce the copies away.
That's going to take more time than I have today.  With good luck and a
tailwind I might have something tomorrow.

On the bright side, this will likely be a cleaner approach, as we won't
have to mess around with glue in the selection DAG.


> While I think we should fix the linker, I fear the buggy linkers are
> already really widespread and so we should probably ensure in the PPC
> backend that r3 is always used. My impression is that much of what you
> are talking about in plumbing this through would also allow us to add
> a synthetic constraint of that form.

More information about the llvm-commits mailing list