Incorrect local-dynamic TLS linker optimization with clang-generated code on PowerPC
Ulrich Weigand
Ulrich.Weigand at de.ibm.com
Wed Jan 28 08:02:06 PST 2015
Hello,
current clang does not bootstrap on ppc64le, due to incorrect code being
generated as a result of TLS linker optimization. Clang generates the
following code to access a local-dynamic TLS symbol in the object file (all
unrelated instructions skipped):
20: 00 00 62 3c addis r3,r2,0
20: R_PPC64_GOT_TLSLD16_HA
PrettyStackTraceHead
2c: 00 00 a3 3b addi r29,r3,0
2c: R_PPC64_GOT_TLSLD16_LO
PrettyStackTraceHead
34: 78 eb a3 7f mr r3,r29
3c: 01 00 00 48 bl 3c
<llvm::PrettyStackTraceEntry::~PrettyStackTraceEntry()+0x3c>
3c: R_PPC64_TLSLD PrettyStackTraceHead
3c: R_PPC64_REL24 __tls_get_addr
40: 00 00 00 60 nop
44: 00 00 63 3c addis r3,r3,0
44: R_PPC64_DTPREL16_HA PrettyStackTraceHead
48: 00 00 63 e8 ld r3,0(r3)
48: R_PPC64_DTPREL16_LO_DS
PrettyStackTraceHead
This is being translated by the linker into:
0x00000000101dc010 <+32>: nop
0x00000000101dc01c <+44>: addis r3,r13,0 <== wrong target
register
0x00000000101dc024 <+52>: mr r3,r29
0x00000000101dc02c <+60>: nop
0x00000000101dc030 <+64>: addi r3,r3,4096
0x00000000101dc034 <+68>: addis r3,r3,0
0x00000000101dc038 <+72>: ld r3,-32768(r3)
Note how the original instruction marked with R_PPC64_GOT_TLSLD16_LO sets
register r29, while the linker-generated replacement sets r3. Looking at
the linker code, the addis "r3, r13, ..." seems to be simply hard-coded,
which works only if the original instruction sets register r3.
Now, this usually will always be the case with GCC-generated code, since
that value must be passed in r3 to the __tls_get_addr call, so it doesn't
really make sense to load it into any other registers; the clang-generated
code is somewhat silly with the "mr r3, r29" register move.
However, the linker should still be able to handle such code correctly
(either by adapting the transform accordingly, or by rejecting the
optimization completely). Note that even in GCC, it is in principle
possible to generate a @got at tlsld@l reloc on an instruction targetting
another register but r3 (since the pattern allows other registers), even
though in practice due to the dataflow into a call parameter, it doesn't
appear to ever happen ...
Alan, can you have a look at the linker optimization?
Bill, can you have a look why LLVM is generating the suboptimal register
allocation here?
Chandler, as a workaround it should be possible to use the initial-exec TLS
model for this variable; this should work fine (at least on Linux). Using
the following line:
static __thread __attribute__((tls_model("initial-exec"))) const
PrettyStackTraceEntry *PrettyStackTraceHead = nullptr;
I was able to complete a bootstrap on powerpc64le-linux (Ubuntu 14.04).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
More information about the llvm-commits
mailing list