[PATCH][X86] Trying to fix PR20243 - leaf frame pointer bug on X86, ELF, PIC

Dario Domizioli dario.domizioli at gmail.com
Tue Nov 11 07:49:08 PST 2014


Good news! (maybe)

I was focusing too much on the instruction definition and not enough on
looking at how the instruction was getting created...
While I inspected the ISel lowering code creating the TLS access pseudo (in
X86ISelLowering.cpp) to find out how to do something similar to what
AArch64 does, I found a comment in the GetTLSADDR() function basically
saying that the code there was responsible for informing the
MachineFrameInfo that the function had calls... however it was only setting
the "adjustsStack" flag and not the "hasCalls" flag.

This means that I don't strictly need to change the definition of the TLS
access pseudo, and I can fix the PR with just a one-line change in
X86ISelLowering.cpp.

Do you think it's acceptable?
I attach the new patch. I also changed the comment in the test to explain
the situation.

Cheers,
    Dario Domizioli
    SN Systems - Sony Computer Entertainment Group









On 11 November 2014 12:16, Dario Domizioli <dario.domizioli at gmail.com>
wrote:

> Thanks Rafael!
>
> I'll have a look at the AArch64 version (and the PPC patch) and I'll try
> to implement something similar.
> I'll post a new patch soon.
>
> Cheers,
>     Dario Domizioli
>     SN Systems - Sony Computer Entertainment Group
>
>
>
>
>
> On 11 November 2014 05:45, Rafael EspĂ­ndola <rafael.espindola at gmail.com>
> wrote:
>
>> For a related issue, see http://reviews.llvm.org/D6209
>>
>> On 10 November 2014 16:02, Rafael EspĂ­ndola <rafael.espindola at gmail.com>
>> wrote:
>> > On 10 November 2014 10:48, Dario Domizioli <dario.domizioli at gmail.com>
>> wrote:
>> >> Hi Rafael.
>> >>
>> >> I have been looking at this for a while, but I cannot seem to find a
>> way of
>> >> preserving an LLVM intrinsic after ISel and expanding it later.
>> >> Basically, if I add an LLVM intrinsic for TLS access in
>> IR/IntrinsicsX86.td,
>> >> then the backend expects me to provide an instruction selection rule
>> for
>> >> lowering it to MachineInstructions; however as you said we cannot do
>> it at
>> >> ISel stage because we must ensure the structure is preserved so that
>> the
>> >> linker can pattern-match it.
>> >>
>> >> With "intrinsic", do you instead mean an actual function that we
>> manufacture
>> >> on the fly in Clang (or provide a declaration for in a header), and
>> then
>> >> eliminate / patch up in MC later on?
>> >> That might work but it seems a bit convoluted to me. Also... if we
>> >> manufacture the function, we end up with a declaration in the IR that
>> >> doesn't correspond to anything in the source; if we instead have the
>> >> function in a header, doesn't that require the user to include the
>> header?
>> >> Is that what you're suggesting, or am I getting confused?
>> >
>> > Sorry, you were right. This has to be an instruction, it is too late
>> > to have an intrinsic.
>> >
>> > Your change to add a isCall is probably also correct. The problem with
>> > TLS_addr64 is that it has not been upgraded to use the new call
>> > representation with register masks.
>> >
>> > Probably the best example of how this should work is what AArch64 does:
>> >
>> > -----------------------------------------------------------------------
>> > let isCall = 1, Defs = [LR] in
>> > def TLSDESC_BLR
>> >     : Pseudo<(outs), (ins GPR64:$dest, i64imm:$sym),
>> >              [(AArch64tlsdesc_call GPR64:$dest, tglobaltlsaddr:$sym)]>;
>> > ------------------------------------------------------------------------
>> >
>> > so it has isCall, but not the explicit Defs list. It is created with
>> >
>> > -------------------------------------
>> >   const uint32_t *Mask = ARI->getTLSCallPreservedMask();
>> > ....
>> >
>> >   SmallVector<SDValue, 6> Ops;
>> >   Ops.push_back(Chain);
>> >   Ops.push_back(Func);
>> >   Ops.push_back(SymAddr);
>> >   Ops.push_back(DAG.getRegister(AArch64::X0, PtrVT));
>> >   Ops.push_back(DAG.getRegisterMask(Mask));
>> >   Ops.push_back(Glue);
>> >
>> >   SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
>> >   Chain = DAG.getNode(AArch64ISD::TLSDESC_CALL, DL, NodeTys, Ops);
>> > --------------------------------------
>> >
>> > With this X86FloatingPoint.cpp should not get confused about
>> > __tls_get_addr returning a value in the fp stack.
>> >
>> > Cheers,
>> > Rafael
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141111/34eac211/attachment.html>
-------------- next part --------------
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp	(revision 221687)
+++ lib/Target/X86/X86ISelLowering.cpp	(working copy)
@@ -12792,6 +12792,7 @@
 
   // TLSADDR will be codegen'ed as call. Inform MFI that function has calls.
   MFI->setAdjustsStack(true);
+  MFI->setHasCalls(true);
 
   SDValue Flag = Chain.getValue(1);
   return DAG.getCopyFromReg(Chain, dl, ReturnReg, PtrVT, Flag);
Index: test/CodeGen/X86/tls-addr-non-leaf-function.ll
===================================================================
--- test/CodeGen/X86/tls-addr-non-leaf-function.ll	(revision 0)
+++ test/CodeGen/X86/tls-addr-non-leaf-function.ll	(working copy)
@@ -0,0 +1,43 @@
+; RUN: llc < %s -relocation-model=pic -O2 -disable-fp-elim -o - | FileCheck %s
+; RUN: llc < %s -relocation-model=pic -O2 -o - | FileCheck %s
+
+; This test runs twice with different options regarding the frame pointer:
+; first the elimination is disabled, then it is enabled. The disabled case is
+; the "control group".
+; The function 'foo' below is marked with the "no-frame-pointer-elim-non-leaf"
+; attribute which dictates that the frame pointer should not be eliminated
+; unless the function is a leaf (i.e. it doesn't call any other function).
+; Now, 'foo' is not a leaf function, because it performs a TLS access which on
+; X86 ELF in PIC mode is expanded as a library call.
+; This call is represented with a pseudo-instruction which doesn't appear to be
+; a call when inspected by the analysis passes (it doesn't have the "isCall"
+; flag), and the ISel lowering code creating the pseudo was not informing the 
+; MachineFrameInfo that the function contained calls. This affected the decision
+; whether to eliminate the frame pointer.
+; With the fix, the "hasCalls" flag is set in the MFI for the function whenever
+; a TLS access pseudo-instruction is created, so 'foo' appears to be a non-leaf
+; function, and the difference in the options does not affect codegen: both
+; versions will have a frame pointer.
+
+; Test that there's some frame pointer usage in 'foo'...
+; CHECK: _Z3fooP10SomeStruct
+; CHECK: pushq %rbp
+; CHECK: movq %rsp, %rbp
+; ... and the TLS library call is also present.
+; CHECK: leaq _ZL2ts at TLSLD(%rip), %rdi
+; CHECK: callq __tls_get_addr at PLT
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.SomeStruct = type { i32 }
+
+ at _ZL2ts = internal thread_local global %struct.SomeStruct zeroinitializer, align 4
+
+define %struct.SomeStruct* @_Z3fooP10SomeStruct(%struct.SomeStruct* readnone %t) "no-frame-pointer-elim-non-leaf" {
+entry:
+  %cmp = icmp eq %struct.SomeStruct* %t, null
+  %_ZL2ts.t = select i1 %cmp, %struct.SomeStruct* @_ZL2ts, %struct.SomeStruct* %t
+  ret %struct.SomeStruct* %_ZL2ts.t
+}
+


More information about the llvm-commits mailing list