Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC

Bill Schmidt wschmidt at linux.vnet.ibm.com
Wed Jan 28 11:16:53 PST 2015


On Wed, 2015-01-28 at 11:04 -0800, Chandler Carruth wrote:
> 
> On Wed, Jan 28, 2015 at 10:06 AM, Bill Schmidt
> <wschmidt at linux.vnet.ibm.com> wrote:
>         On Wed, 2015-01-28 at 11:30 -0600, Bill Schmidt wrote:
>         *snip*
>         >
>         > The issue is that we are generating two calls to
>         __get_tls_addr in
>         > different basic blocks.  CSE at the MI level recognizes that
>         the address
>         > computation can be commoned.  So r29 is copied to r3 before
>         both of the
>         > __get_tls_addr calls.
>         >
>         > I think it would probably be somewhat difficult to avoid
>         this commoning
>         > (would have to be specific to these address ops and only
>         when accessing
>         > TLS vars for local/global-dynamic).  And in general it's a
>         good thing to
>         > do.  Alan, does this complicate matters beyond what the
>         linker can
>         > handle?
>         
>         Part of the issue here is the late modeling of the call to
>         __get_tls_addr as a true call.  The MachineCSE pass does not
>         have any
>         logic to common idempotent calls (presumably that is handled
>         at the IR
>         level), so the addis/addi get commoned but the __get_tls_call
>         does not.
>         
>         Originally we modeled __get_tls_addr as a special
>         target-specific opcode
>         that was expanded late.  We had problems with keeping the
>         register
>         copies lined up correctly in the presence of multiple TLS
>         variable
>         accesses, and we fixed this by using the regular call
>         machinery instead.
>         In retrospect, it appears that although this is a much cleaner
>         implementation, it limits the ability to CSE the
>         __get_tls_addr calls.
>         
>         We can work on a fix to go back to the old model and fix the
>         register
>         copies with glue instead, but this would be a pretty
>         substantial change
>         to backport to 3.6.  However, if the linker can't handle the
>         code as is,
>         then this would seem to be the only way forward.
>         
>         Thoughts?
> 
> I think that this is the right direction, definitely. I don't feel
> strongly about backporting it to 3.6 though, we can just put a
> workaround such as the one suggested by Ulrich there (and I'm working
> on that now). The key is to make sure 3.7 handles this well.
> 
> 
> While I think we should fix the linker, I fear the buggy linkers are
> already really widespread and so we should probably ensure in the PPC
> backend that r3 is always used. My impression is that much of what you
> are talking about in plumbing this through would also allow us to add
> a synthetic constraint of that form.

OK.  I'll pull together a patch in this direction.  Thanks for the
feedback!

Without the CSE issue we ran into here, we were already getting things
effectively into r3 (we have existing test cases looking for that) at
-O1 and above.  I'm not so sure things are great at -O0, though.




More information about the llvm-commits mailing list