Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC

Ulrich Weigand Ulrich.Weigand at
Wed Jan 28 08:02:06 PST 2015


current clang does not bootstrap on ppc64le, due to incorrect code being
generated as a result of TLS linker optimization.  Clang generates the
following code to access a local-dynamic TLS symbol in the object file (all
unrelated instructions skipped):

  20:   00 00 62 3c     addis   r3,r2,0
                        20: R_PPC64_GOT_TLSLD16_HA
  2c:   00 00 a3 3b     addi    r29,r3,0
                        2c: R_PPC64_GOT_TLSLD16_LO
  34:   78 eb a3 7f     mr      r3,r29
  3c:   01 00 00 48     bl      3c
                        3c: R_PPC64_TLSLD       PrettyStackTraceHead
                        3c: R_PPC64_REL24       __tls_get_addr
  40:   00 00 00 60     nop
  44:   00 00 63 3c     addis   r3,r3,0
                        44: R_PPC64_DTPREL16_HA PrettyStackTraceHead
  48:   00 00 63 e8     ld      r3,0(r3)
                        48: R_PPC64_DTPREL16_LO_DS

This is being translated by the linker into:

   0x00000000101dc010 <+32>:    nop
   0x00000000101dc01c <+44>:    addis   r3,r13,0    <== wrong target
   0x00000000101dc024 <+52>:    mr      r3,r29
   0x00000000101dc02c <+60>:    nop
   0x00000000101dc030 <+64>:    addi    r3,r3,4096
   0x00000000101dc034 <+68>:    addis   r3,r3,0
   0x00000000101dc038 <+72>:    ld      r3,-32768(r3)

Note how the original instruction marked with R_PPC64_GOT_TLSLD16_LO sets
register r29, while the linker-generated replacement sets r3.  Looking at
the linker code, the addis "r3, r13, ..." seems to be simply hard-coded,
which works only if the original instruction sets register r3.

Now, this usually will always be the case with GCC-generated code, since
that value must be passed in r3 to the __tls_get_addr call, so it doesn't
really make sense to load it into any other registers; the clang-generated
code is somewhat silly with the "mr r3, r29" register move.

However, the linker should still be able to handle such code correctly
(either by adapting the transform accordingly, or by rejecting the
optimization completely).  Note that even in GCC, it is in principle
possible to generate a @got at tlsld@l reloc on an instruction targetting
another register but r3 (since the pattern allows other registers), even
though in practice due to the dataflow into a call parameter, it doesn't
appear to ever happen ...

Alan, can you have a look at the linker optimization?

Bill, can you have a look why LLVM is generating the suboptimal register
allocation here?

Chandler, as a workaround it should be possible to use the initial-exec TLS
model for this variable; this should work fine (at least on Linux).   Using
the following line:

static __thread __attribute__((tls_model("initial-exec"))) const
  PrettyStackTraceEntry *PrettyStackTraceHead = nullptr;

I was able to complete a bootstrap on powerpc64le-linux (Ubuntu 14.04).

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU/Linux compilers and toolchain
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294

More information about the llvm-commits mailing list