[llvm-dev] structure-return tailcall

Wed Aug 18 11:43:45 PDT 2021

I'm working on pr51000, and thinking about the case of large structures 
returned by artificial sret pointer parm.

I have questions.

The itanium ABI requires functions that return a large struct this way, 
to *also* return that pointer as their scalar return value.  (Let's not 
get into the pros and cons of that, it is what it is.  I'm looking at 
x86_64 primarily, but I understand ISAs have similar ABIs.)

Anyway, to do that requires some data-flow work, and being a newbie to 
llvm IR, I can see two ways to do this.  It is not clear to me which is 
the easiest or best.  Plus I find discrepancies between documentation, 
tests and implementation!

Consider:

struct Big {  int ary[50];   Big (); };

Big Foo ();

Big Bar () {  return Foo (); }

Here's the IR:

define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big) 
align 4 %0) local_unnamed_addr #0 {
   tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0)
   ret void
}

I.e. the middle end figures this is tail callable, but I don't think it 
knows about the pointer return requirement (see below for evidence).

Test/documentation mismatch:  The tailcall documentation says: 
(https://llvm.org/docs/LangRef.html#call-instruction)
   'Both markers imply that the callee does not access allocas
    from the caller.'

However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems 
to break that.  Specifically:

define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias 
sret(%struct.foo) %agg.result) nounwind  {
   %a = alloca %struct.foo, align 8
   tail call fastcc void @t21_f_sret(%struct.foo* noalias 
sret(%struct.foo) %a) nounwind
   ret void
}

That call to t21_f_sret is referencing the frame-allocated %a object.

Question: Is sibcall.ll correct or not?

Implementation/documentation mismatch: I also note that the tail marker 
can appear even when the call is NOT the last (real) instruction in the 
function.  That seems strange.

The documentation says:
   'The optional tail and musttail markers indicate that the
    optimizers should perform tail call optimization.'

Consider:
struct Big { int ary[50]; Big (); };

void Frob ();

Big Baz () { Big b; Frob (); return b; }

this generates:

define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull 
sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
   tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200) %0)
   tail call void @_Z4Frobv()
   ret void
}

We can tail call Frob, but not Big's constructor.  Why is the ctor 
marked as tailcallable?

[as an aside, if the middle end knew about the sret pointer return 
requirement, it wouldn't have marked Frob as tailcallable, right?]

Question: should the ctor not be marked tail call, or should the 
documentation be adjusted to at least mention this behaviour?

Anyway, the backend code-generator checks additional constraints before 
performing the tailcall.

a) Should the x86 backend track where it assigned the incomming sret 
pointer and see if that's being passed to the tail call?  (I've not 
figured out how to do that yet).

b) or should the middle end annotate that tail call as passing the 
incoming sret? (metadata? new marker? something else?)  This would seem 
to avoid having to implement #a for each backend that has this requirement.

Question: any insights as to whether #a or #b is the better direction?

nathan

-- 
Nathan Sidwell