Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC
Bill Schmidt
wschmidt at linux.vnet.ibm.com
Wed Jan 28 09:30:41 PST 2015
On Wed, 2015-01-28 at 17:02 +0100, Ulrich Weigand wrote:
> Hello,
>
> current clang does not bootstrap on ppc64le, due to incorrect code being
> generated as a result of TLS linker optimization. Clang generates the
> following code to access a local-dynamic TLS symbol in the object file (all
> unrelated instructions skipped):
>
> 20: 00 00 62 3c addis r3,r2,0
> 20: R_PPC64_GOT_TLSLD16_HA
> PrettyStackTraceHead
> 2c: 00 00 a3 3b addi r29,r3,0
> 2c: R_PPC64_GOT_TLSLD16_LO
> PrettyStackTraceHead
> 34: 78 eb a3 7f mr r3,r29
> 3c: 01 00 00 48 bl 3c
> <llvm::PrettyStackTraceEntry::~PrettyStackTraceEntry()+0x3c>
> 3c: R_PPC64_TLSLD PrettyStackTraceHead
> 3c: R_PPC64_REL24 __tls_get_addr
> 40: 00 00 00 60 nop
> 44: 00 00 63 3c addis r3,r3,0
> 44: R_PPC64_DTPREL16_HA PrettyStackTraceHead
> 48: 00 00 63 e8 ld r3,0(r3)
> 48: R_PPC64_DTPREL16_LO_DS
> PrettyStackTraceHead
>
> This is being translated by the linker into:
>
> 0x00000000101dc010 <+32>: nop
> 0x00000000101dc01c <+44>: addis r3,r13,0 <== wrong target
> register
> 0x00000000101dc024 <+52>: mr r3,r29
> 0x00000000101dc02c <+60>: nop
> 0x00000000101dc030 <+64>: addi r3,r3,4096
> 0x00000000101dc034 <+68>: addis r3,r3,0
> 0x00000000101dc038 <+72>: ld r3,-32768(r3)
>
> Note how the original instruction marked with R_PPC64_GOT_TLSLD16_LO sets
> register r29, while the linker-generated replacement sets r3. Looking at
> the linker code, the addis "r3, r13, ..." seems to be simply hard-coded,
> which works only if the original instruction sets register r3.
>
> Now, this usually will always be the case with GCC-generated code, since
> that value must be passed in r3 to the __tls_get_addr call, so it doesn't
> really make sense to load it into any other registers; the clang-generated
> code is somewhat silly with the "mr r3, r29" register move.
>
> However, the linker should still be able to handle such code correctly
> (either by adapting the transform accordingly, or by rejecting the
> optimization completely). Note that even in GCC, it is in principle
> possible to generate a @got at tlsld@l reloc on an instruction targetting
> another register but r3 (since the pattern allows other registers), even
> though in practice due to the dataflow into a call parameter, it doesn't
> appear to ever happen ...
>
> Alan, can you have a look at the linker optimization?
>
> Bill, can you have a look why LLVM is generating the suboptimal register
> allocation here?
The issue is that we are generating two calls to __get_tls_addr in
different basic blocks. CSE at the MI level recognizes that the address
computation can be commoned. So r29 is copied to r3 before both of the
__get_tls_addr calls.
I think it would probably be somewhat difficult to avoid this commoning
(would have to be specific to these address ops and only when accessing
TLS vars for local/global-dynamic). And in general it's a good thing to
do. Alan, does this complicate matters beyond what the linker can
handle?
Bill
>
> Chandler, as a workaround it should be possible to use the initial-exec TLS
> model for this variable; this should work fine (at least on Linux). Using
> the following line:
>
> static __thread __attribute__((tls_model("initial-exec"))) const
> PrettyStackTraceEntry *PrettyStackTraceHead = nullptr;
>
> I was able to complete a bootstrap on powerpc64le-linux (Ubuntu 14.04).
>
>
> Mit freundlichen Gruessen / Best Regards
>
> Ulrich Weigand
>
> --
> Dr. Ulrich Weigand | Phone: +49-7031/16-3727
> STSM, GNU/Linux compilers and toolchain
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
> Wittkopp
> Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294
>
More information about the llvm-commits
mailing list