Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC
wschmidt at linux.vnet.ibm.com
Fri Jan 30 14:05:36 PST 2015
On Thu, 2015-01-29 at 15:43 -0600, Bill Schmidt wrote:
> On Wed, 2015-01-28 at 11:04 -0800, Chandler Carruth wrote:
> > On Wed, Jan 28, 2015 at 10:06 AM, Bill Schmidt
> > <wschmidt at linux.vnet.ibm.com> wrote:
> > On Wed, 2015-01-28 at 11:30 -0600, Bill Schmidt wrote:
> > *snip*
> > >
> > > The issue is that we are generating two calls to
> > __get_tls_addr in
> > > different basic blocks. CSE at the MI level recognizes that
> > the address
> > > computation can be commoned. So r29 is copied to r3 before
> > both of the
> > > __get_tls_addr calls.
> > >
> > > I think it would probably be somewhat difficult to avoid
> > this commoning
> > > (would have to be specific to these address ops and only
> > when accessing
> > > TLS vars for local/global-dynamic). And in general it's a
> > good thing to
> > > do. Alan, does this complicate matters beyond what the
> > linker can
> > > handle?
> > Part of the issue here is the late modeling of the call to
> > __get_tls_addr as a true call. The MachineCSE pass does not
> > have any
> > logic to common idempotent calls (presumably that is handled
> > at the IR
> > level), so the addis/addi get commoned but the __get_tls_call
> > does not.
> > Originally we modeled __get_tls_addr as a special
> > target-specific opcode
> > that was expanded late. We had problems with keeping the
> > register
> > copies lined up correctly in the presence of multiple TLS
> > variable
> > accesses, and we fixed this by using the regular call
> > machinery instead.
> > In retrospect, it appears that although this is a much cleaner
> > implementation, it limits the ability to CSE the
> > __get_tls_addr calls.
> > We can work on a fix to go back to the old model and fix the
> > register
> > copies with glue instead, but this would be a pretty
> > substantial change
> > to backport to 3.6. However, if the linker can't handle the
> > code as is,
> > then this would seem to be the only way forward.
> > Thoughts?
> > I think that this is the right direction, definitely. I don't feel
> > strongly about backporting it to 3.6 though, we can just put a
> > workaround such as the one suggested by Ulrich there (and I'm working
> > on that now). The key is to make sure 3.7 handles this well.
> Well, I thought I was going to have this done today, but I haven't
> actually made anything better yet. :/
> I've gone back to the old model and fixed things up with glue, but the
> fact that we are exposing the call/return register 3 at this point is
> still causing MachineCSE to reject the op representing the
> __get_tls_addr call for commoning.
> I think what I'll have to do is use only virtual registers in the
> lowering, then wait until after MachineCSE to introduce register copies
> from/to R3 around the call. RA should then coalesce the copies away.
> That's going to take more time than I have today. With good luck and a
> tailwind I might have something tomorrow.
> On the bright side, this will likely be a cleaner approach, as we won't
> have to mess around with glue in the selection DAG.
Progress report: I have a patch that now bootstraps cleanly using the
original TLS variable added to PrettyStackTraceEntry. I need to do some
cleanup on it, add a test case, and put it out for review. Should be
upstream early next week if no major problems come out of the review.
After that we should be able to remove the linker option workaround.
> > While I think we should fix the linker, I fear the buggy linkers are
> > already really widespread and so we should probably ensure in the PPC
> > backend that r3 is always used. My impression is that much of what you
> > are talking about in plumbing this through would also allow us to add
> > a synthetic constraint of that form.
More information about the llvm-commits