[PATCH] Fix for PR15086

Alp Toker alp at nuanti.com
Wed Nov 13 00:15:00 PST 2013

This makes a lot of sense and matters to many users, I'd like to have
this in 3.4.

Guessing Nick held back from signing off on this at the last moment in
case it disables tail calls causing a performance regression somewhere,
but from a correctness point of view this seems necessary.

So +1 on the ping and thanks for updating the patch Dimitry.

PS. It _might_ be worth having a specific test for this with a comment
explaining what's being tested, just so it doesn't regress again given
that it was hard to detect in the wild.


On 13/11/2013 07:50, Dimitry Andric wrote:
> On 04 Nov 2013, at 22:27, Dimitry Andric <dimitry at andric.com> wrote:
>> On 30 Oct 2013, at 22:39, Dimitry Andric <dimitry at andric.com> wrote:
>>> On 21 Oct 2013, at 21:38, Dimitry Andric <dimitry at andric.com> wrote:
>>>> On Oct 21, 2013, at 02:49, Nick Lewycky <nicholas at mxc.ca> wrote:
>>>>> Dimitry Andric wrote:
>>>>>> On Oct 20, 2013, at 23:48, Nick Lewycky<nicholas at mxc.ca>  wrote:
>>>> ...
>>>>>>> Also, what does this do to:
>>>>>>> extern int i;
>>>>>>> int foo(void) { return i; }
>>>>>>> int bar(void) { return foo(); }
>>>>>>> foo is externally visible but this can also be a tail call because we don't permit symbol interposition in cases where inlining would be legal.
>>>>>> There does not seem to be any change in the produced code.  At -O0, the call to foo() in bar() gets done via the PLT, with or without my patch.  With any optimization, clang inlines foo() into bar(), and directly uses i at GOT to access the variable.  Since variables cannot be lazily linked, this is no problem.
>>>>> Please try harder. :) Does __attribute__((noinline)) help? Alternatively, skip C and write it as straight .ll (be sure to include the 'tail' marker on the call).
>>>> Ah yes, with noinline, foo() is considered a GlobalAddressSDNode, but it is not hidden or protected, so it is treated in the same way as an external call.  So then the jump gets changed into a call.
>>>> I am not sure if there is a way to distinguish between a GlobalAddressSDNode that is "somewhere else" and a GlobalAddressSDNode that is local to the current compilation unit.  If so, we might turn the former into a call-via-PLT, and the latter into a jump-via-GOT.  Any idea?
>>>> However, if you look at the PIC code generated for a tail jump on i386, it becomes:
>>>> bar:                                    # @bar
>>>> # BB#0:
>>>>      pushl   %ebp
>>>>      movl    %esp, %ebp
>>>>      calll   .L1$pb
>>>> .L1$pb:
>>>>      popl    %eax
>>>> .Ltmp2:
>>>>      addl    $_GLOBAL_OFFSET_TABLE_+(.Ltmp2-.L1$pb), %eax
>>>>      movl    foo at GOT(%eax), %eax
>>>>      popl    %ebp
>>>>      jmpl    *%eax                   # TAILCALL
>>>> while the call-via-PLT version is:
>>>> bar:                                    # @bar
>>>> # BB#0:
>>>>      pushl   %ebp
>>>>      movl    %esp, %ebp
>>>>      calll   foo at PLT
>>>>      popl    %ebp
>>>>      ret
>>>> The latter just looks more efficient to me, or am I deluded? :-)
>>> So, any thoughts or remarks on this?  Would the original patch (minus the small unrelated testcase fix that was already committed) be OK for applying?
>>> I would really like to get this fix in before 3.4 release. :-)
>> Ping.  An updated patch, against trunk r194011 (with a fix for clang wanting more parentheses) is attached.  This passes all test cases, as far as I could check.
> Ping^2 ...
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

the browser experts

More information about the llvm-commits mailing list