Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC
wschmidt at linux.vnet.ibm.com
Thu Jan 29 13:43:15 PST 2015
On Wed, 2015-01-28 at 11:04 -0800, Chandler Carruth wrote:
> On Wed, Jan 28, 2015 at 10:06 AM, Bill Schmidt
> <wschmidt at linux.vnet.ibm.com> wrote:
> On Wed, 2015-01-28 at 11:30 -0600, Bill Schmidt wrote:
> > The issue is that we are generating two calls to
> __get_tls_addr in
> > different basic blocks. CSE at the MI level recognizes that
> the address
> > computation can be commoned. So r29 is copied to r3 before
> both of the
> > __get_tls_addr calls.
> > I think it would probably be somewhat difficult to avoid
> this commoning
> > (would have to be specific to these address ops and only
> when accessing
> > TLS vars for local/global-dynamic). And in general it's a
> good thing to
> > do. Alan, does this complicate matters beyond what the
> linker can
> > handle?
> Part of the issue here is the late modeling of the call to
> __get_tls_addr as a true call. The MachineCSE pass does not
> have any
> logic to common idempotent calls (presumably that is handled
> at the IR
> level), so the addis/addi get commoned but the __get_tls_call
> does not.
> Originally we modeled __get_tls_addr as a special
> target-specific opcode
> that was expanded late. We had problems with keeping the
> copies lined up correctly in the presence of multiple TLS
> accesses, and we fixed this by using the regular call
> machinery instead.
> In retrospect, it appears that although this is a much cleaner
> implementation, it limits the ability to CSE the
> __get_tls_addr calls.
> We can work on a fix to go back to the old model and fix the
> copies with glue instead, but this would be a pretty
> substantial change
> to backport to 3.6. However, if the linker can't handle the
> code as is,
> then this would seem to be the only way forward.
> I think that this is the right direction, definitely. I don't feel
> strongly about backporting it to 3.6 though, we can just put a
> workaround such as the one suggested by Ulrich there (and I'm working
> on that now). The key is to make sure 3.7 handles this well.
Well, I thought I was going to have this done today, but I haven't
actually made anything better yet. :/
I've gone back to the old model and fixed things up with glue, but the
fact that we are exposing the call/return register 3 at this point is
still causing MachineCSE to reject the op representing the
__get_tls_addr call for commoning.
I think what I'll have to do is use only virtual registers in the
lowering, then wait until after MachineCSE to introduce register copies
from/to R3 around the call. RA should then coalesce the copies away.
That's going to take more time than I have today. With good luck and a
tailwind I might have something tomorrow.
On the bright side, this will likely be a cleaner approach, as we won't
have to mess around with glue in the selection DAG.
> While I think we should fix the linker, I fear the buggy linkers are
> already really widespread and so we should probably ensure in the PPC
> backend that r3 is always used. My impression is that much of what you
> are talking about in plumbing this through would also allow us to add
> a synthetic constraint of that form.
More information about the llvm-commits