[PATCH] Fix for PR15086

Nick Lewycky nlewycky at google.com
Wed Nov 13 14:39:16 PST 2013


Sorry, I forgot about this thread entirely. I generally don't approve
non-trivial changes under lib/Target and lib/CodeGen, I'd rather somebody
else with more assembly/object file/backend knowledge took a look.

On 13 November 2013 07:31, Rafael EspĂ­ndola <rafael.espindola at gmail.com>wrote:

> I *think* this is a fine improvement, but please add
>
> * A comment saying why jmp foo at PLT is not valid or when it would be valid.
> * A note in tailcallpic2.ll saying that it could be optimized to a
> direct call (no got, no plt). We don't implement ELF's interposition
> logic, but then pretend we do during codegen. This patch doesn't need
> to fix that, but should note that we could do better.
>
> Nick, do you agree?
>

I'm a little confused. Either we should implement ELF interposition logic
(and that means blocking inlining of interposable functions) or we
shouldn't -- and I think we've decided previously that we shouldn't. If the
issue is just that it's a missed optimization then that's fine.

Nick

On 13 November 2013 03:15, Alp Toker <alp at nuanti.com> wrote:
> > This makes a lot of sense and matters to many users, I'd like to have
> > this in 3.4.
> >
> > Guessing Nick held back from signing off on this at the last moment in
> > case it disables tail calls causing a performance regression somewhere,
> > but from a correctness point of view this seems necessary.
> >
> > So +1 on the ping and thanks for updating the patch Dimitry.
> >
> > PS. It _might_ be worth having a specific test for this with a comment
> > explaining what's being tested, just so it doesn't regress again given
> > that it was hard to detect in the wild.
> >
> > Alp.
> >
> >
> > On 13/11/2013 07:50, Dimitry Andric wrote:
> >> On 04 Nov 2013, at 22:27, Dimitry Andric <dimitry at andric.com> wrote:
> >>> On 30 Oct 2013, at 22:39, Dimitry Andric <dimitry at andric.com> wrote:
> >>>> On 21 Oct 2013, at 21:38, Dimitry Andric <dimitry at andric.com> wrote:
> >>>>> On Oct 21, 2013, at 02:49, Nick Lewycky <nicholas at mxc.ca> wrote:
> >>>>>> Dimitry Andric wrote:
> >>>>>>> On Oct 20, 2013, at 23:48, Nick Lewycky<nicholas at mxc.ca>  wrote:
> >>>>> ...
> >>>>>>>> Also, what does this do to:
> >>>>>>>>
> >>>>>>>> extern int i;
> >>>>>>>> int foo(void) { return i; }
> >>>>>>>> int bar(void) { return foo(); }
> >>>>>>>>
> >>>>>>>> foo is externally visible but this can also be a tail call
> because we don't permit symbol interposition in cases where inlining would
> be legal.
> >>>>>>> There does not seem to be any change in the produced code.  At
> -O0, the call to foo() in bar() gets done via the PLT, with or without my
> patch.  With any optimization, clang inlines foo() into bar(), and directly
> uses i at GOT to access the variable.  Since variables cannot be lazily
> linked, this is no problem.
> >>>>>> Please try harder. :) Does __attribute__((noinline)) help?
> Alternatively, skip C and write it as straight .ll (be sure to include the
> 'tail' marker on the call).
> >>>>> Ah yes, with noinline, foo() is considered a GlobalAddressSDNode,
> but it is not hidden or protected, so it is treated in the same way as an
> external call.  So then the jump gets changed into a call.
> >>>>>
> >>>>> I am not sure if there is a way to distinguish between a
> GlobalAddressSDNode that is "somewhere else" and a GlobalAddressSDNode that
> is local to the current compilation unit.  If so, we might turn the former
> into a call-via-PLT, and the latter into a jump-via-GOT.  Any idea?
> >>>>>
> >>>>> However, if you look at the PIC code generated for a tail jump on
> i386, it becomes:
> >>>>>
> >>>>> bar:                                    # @bar
> >>>>> # BB#0:
> >>>>>      pushl   %ebp
> >>>>>      movl    %esp, %ebp
> >>>>>      calll   .L1$pb
> >>>>> .L1$pb:
> >>>>>      popl    %eax
> >>>>> .Ltmp2:
> >>>>>      addl    $_GLOBAL_OFFSET_TABLE_+(.Ltmp2-.L1$pb), %eax
> >>>>>      movl    foo at GOT(%eax), %eax
> >>>>>      popl    %ebp
> >>>>>      jmpl    *%eax                   # TAILCALL
> >>>>>
> >>>>> while the call-via-PLT version is:
> >>>>>
> >>>>> bar:                                    # @bar
> >>>>> # BB#0:
> >>>>>      pushl   %ebp
> >>>>>      movl    %esp, %ebp
> >>>>>      calll   foo at PLT
> >>>>>      popl    %ebp
> >>>>>      ret
> >>>>>
> >>>>> The latter just looks more efficient to me, or am I deluded? :-)
> >>>> So, any thoughts or remarks on this?  Would the original patch (minus
> the small unrelated testcase fix that was already committed) be OK for
> applying?
> >>>>
> >>>> I would really like to get this fix in before 3.4 release. :-)
> >>> Ping.  An updated patch, against trunk r194011 (with a fix for clang
> wanting more parentheses) is attached.  This passes all test cases, as far
> as I could check.
> >> Ping^2 ...
> >>
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> > --
> > http://www.nuanti.com
> > the browser experts
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131113/ab64d04a/attachment.html>


More information about the llvm-commits mailing list