[llvm-dev] structure-return tailcall

Thu Aug 19 14:14:55 PDT 2021

On Wed, Aug 18, 2021 at 11:44 AM Nathan Sidwell via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I'm working on pr51000, and thinking about the case of large structures
> returned by artificial sret pointer parm.
>
> I have questions.
>
> The itanium ABI requires functions that return a large struct this way,
> to *also* return that pointer as their scalar return value.  (Let's not
> get into the pros and cons of that, it is what it is.  I'm looking at
> x86_64 primarily, but I understand ISAs have similar ABIs.)
>

Super nit: it's the psABI that controls how large objects are passed, the
Itanium (C++) ABI only cares about the C++-y aspects of a struct. Apologies
for the pedantry.

> Anyway, to do that requires some data-flow work, and being a newbie to
> llvm IR, I can see two ways to do this.  It is not clear to me which is
> the easiest or best.  Plus I find discrepancies between documentation,
> tests and implementation!
>
> Consider:
>
> struct Big {  int ary[50];   Big (); };
>
> Big Foo ();
>
> Big Bar () {  return Foo (); }
>
> Here's the IR:
>
> define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big)
> align 4 %0) local_unnamed_addr #0 {
>    tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0)
>    ret void
> }
>
> I.e. the middle end figures this is tail callable, but I don't think it
> knows about the pointer return requirement (see below for evidence).
>
> Test/documentation mismatch:  The tailcall documentation says:
> (https://llvm.org/docs/LangRef.html#call-instruction)
>    'Both markers imply that the callee does not access allocas
>     from the caller.'
>
> However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems
> to break that.  Specifically:
>
> define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias
> sret(%struct.foo) %agg.result) nounwind  {
>    %a = alloca %struct.foo, align 8
>    tail call fastcc void @t21_f_sret(%struct.foo* noalias
> sret(%struct.foo) %a) nounwind
>    ret void
> }
>
> That call to t21_f_sret is referencing the frame-allocated %a object.
>
> Question: Is sibcall.ll correct or not?
>

I think you are correct: the IR will exhibit UB.

But, that doesn't mean the test case isn't useful. I haven't looked
further, but maybe the test is meant to illustrate what would happen if a
user did the wrong thing by adding the tail marker when they shouldn't have.

> Implementation/documentation mismatch: I also note that the tail marker
> can appear even when the call is NOT the last (real) instruction in the
> function.  That seems strange.
>

This is true. The tail marker doesn't really mark call sites in tail
positions, it's a statement about aliasing. It simply marks call sites that
do not reference stack objects from the current frame. If the call happens
to be in the tail position later during codegen, it can become a TCO
candidate.

> The documentation says:
>    'The optional tail and musttail markers indicate that the
>     optimizers should perform tail call optimization.'
>
> Consider:
> struct Big { int ary[50]; Big (); };
>
> void Frob ();
>
> Big Baz () { Big b; Frob (); return b; }
>
> this generates:
>
> define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull
> sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
>    tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200)
> %0)
>    tail call void @_Z4Frobv()
>    ret void
> }
>
> We can tail call Frob, but not Big's constructor.  Why is the ctor
> marked as tailcallable?
>

I guess the documentation is too simplistic. The tail marker is really a
way to pass AA knowledge to the backend. It doesn't really indicate that
the backend "should" perform TCO, it's just passing down info from the
middle-end.

> [as an aside, if the middle end knew about the sret pointer return
> requirement, it wouldn't have marked Frob as tailcallable, right?]
>
> Question: should the ctor not be marked tail call, or should the
> documentation be adjusted to at least mention this behaviour?
>
> Anyway, the backend code-generator checks additional constraints before
> performing the tailcall.
>
> a) Should the x86 backend track where it assigned the incomming sret
> pointer and see if that's being passed to the tail call?  (I've not
> figured out how to do that yet).
>
> b) or should the middle end annotate that tail call as passing the
> incoming sret? (metadata? new marker? something else?)  This would seem
> to avoid having to implement #a for each backend that has this requirement.
>
> Question: any insights as to whether #a or #b is the better direction?
>

This feels like a target-specific constraint, so I feel like #a is better.
You could peek at the IR to make this easy, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210819/1e6b61d3/attachment.html>