[PATCH][X86] Trying to fix PR20243 - leaf frame pointer bug on X86, ELF, PIC

Tue Nov 11 09:52:55 PST 2014

Yes. It is a valid bug fix. It would still be nice to cleanup the x86
implementation, but that is independent.

My only request for the patch is to reduce the testcase a bit. For
example, you don't need the struct.SomeStruct. You should be able to
test the difference with only

@x = thread_local global i32 0
define i32 @f() "no-frame-pointer-elim-non-leaf" {
  %a = load i32* @x, align 4
  ret i32 %a
}

no?

LGTM with that.

On 11 November 2014 12:19, Reid Kleckner <rnk at google.com> wrote:
> Works for me, I think that definitely fixes a bug in existing code. Rafael,
> do you think this approach is OK?
>
> On Tue, Nov 11, 2014 at 7:49 AM, Dario Domizioli <dario.domizioli at gmail.com>
> wrote:
>>
>> Good news! (maybe)
>>
>> I was focusing too much on the instruction definition and not enough on
>> looking at how the instruction was getting created...
>> While I inspected the ISel lowering code creating the TLS access pseudo
>> (in X86ISelLowering.cpp) to find out how to do something similar to what
>> AArch64 does, I found a comment in the GetTLSADDR() function basically
>> saying that the code there was responsible for informing the
>> MachineFrameInfo that the function had calls... however it was only setting
>> the "adjustsStack" flag and not the "hasCalls" flag.
>>
>> This means that I don't strictly need to change the definition of the TLS
>> access pseudo, and I can fix the PR with just a one-line change in
>> X86ISelLowering.cpp.
>>
>> Do you think it's acceptable?
>> I attach the new patch. I also changed the comment in the test to explain
>> the situation.
>>
>> Cheers,
>>     Dario Domizioli
>>     SN Systems - Sony Computer Entertainment Group
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 11 November 2014 12:16, Dario Domizioli <dario.domizioli at gmail.com>
>> wrote:
>>>
>>> Thanks Rafael!
>>>
>>> I'll have a look at the AArch64 version (and the PPC patch) and I'll try
>>> to implement something similar.
>>> I'll post a new patch soon.
>>>
>>> Cheers,
>>>     Dario Domizioli
>>>     SN Systems - Sony Computer Entertainment Group
>>>
>>>
>>>
>>>
>>>
>>> On 11 November 2014 05:45, Rafael Espíndola <rafael.espindola at gmail.com>
>>> wrote:
>>>>
>>>> For a related issue, see http://reviews.llvm.org/D6209
>>>>
>>>> On 10 November 2014 16:02, Rafael Espíndola <rafael.espindola at gmail.com>
>>>> wrote:
>>>> > On 10 November 2014 10:48, Dario Domizioli <dario.domizioli at gmail.com>
>>>> > wrote:
>>>> >> Hi Rafael.
>>>> >>
>>>> >> I have been looking at this for a while, but I cannot seem to find a
>>>> >> way of
>>>> >> preserving an LLVM intrinsic after ISel and expanding it later.
>>>> >> Basically, if I add an LLVM intrinsic for TLS access in
>>>> >> IR/IntrinsicsX86.td,
>>>> >> then the backend expects me to provide an instruction selection rule
>>>> >> for
>>>> >> lowering it to MachineInstructions; however as you said we cannot do
>>>> >> it at
>>>> >> ISel stage because we must ensure the structure is preserved so that
>>>> >> the
>>>> >> linker can pattern-match it.
>>>> >>
>>>> >> With "intrinsic", do you instead mean an actual function that we
>>>> >> manufacture
>>>> >> on the fly in Clang (or provide a declaration for in a header), and
>>>> >> then
>>>> >> eliminate / patch up in MC later on?
>>>> >> That might work but it seems a bit convoluted to me. Also... if we
>>>> >> manufacture the function, we end up with a declaration in the IR that
>>>> >> doesn't correspond to anything in the source; if we instead have the
>>>> >> function in a header, doesn't that require the user to include the
>>>> >> header?
>>>> >> Is that what you're suggesting, or am I getting confused?
>>>> >
>>>> > Sorry, you were right. This has to be an instruction, it is too late
>>>> > to have an intrinsic.
>>>> >
>>>> > Your change to add a isCall is probably also correct. The problem with
>>>> > TLS_addr64 is that it has not been upgraded to use the new call
>>>> > representation with register masks.
>>>> >
>>>> > Probably the best example of how this should work is what AArch64
>>>> > does:
>>>> >
>>>> >
>>>> > -----------------------------------------------------------------------
>>>> > let isCall = 1, Defs = [LR] in
>>>> > def TLSDESC_BLR
>>>> >     : Pseudo<(outs), (ins GPR64:$dest, i64imm:$sym),
>>>> >              [(AArch64tlsdesc_call GPR64:$dest,
>>>> > tglobaltlsaddr:$sym)]>;
>>>> >
>>>> > ------------------------------------------------------------------------
>>>> >
>>>> > so it has isCall, but not the explicit Defs list. It is created with
>>>> >
>>>> > -------------------------------------
>>>> >   const uint32_t *Mask = ARI->getTLSCallPreservedMask();
>>>> > ....
>>>> >
>>>> >   SmallVector<SDValue, 6> Ops;
>>>> >   Ops.push_back(Chain);
>>>> >   Ops.push_back(Func);
>>>> >   Ops.push_back(SymAddr);
>>>> >   Ops.push_back(DAG.getRegister(AArch64::X0, PtrVT));
>>>> >   Ops.push_back(DAG.getRegisterMask(Mask));
>>>> >   Ops.push_back(Glue);
>>>> >
>>>> >   SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
>>>> >   Chain = DAG.getNode(AArch64ISD::TLSDESC_CALL, DL, NodeTys, Ops);
>>>> > --------------------------------------
>>>> >
>>>> > With this X86FloatingPoint.cpp should not get confused about
>>>> > __tls_get_addr returning a value in the fp stack.
>>>> >
>>>> > Cheers,
>>>> > Rafael
>>>
>>>
>>
>