[llvm-dev] structure-return tailcall
Nathan Sidwell via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 18 11:43:45 PDT 2021
I'm working on pr51000, and thinking about the case of large structures
returned by artificial sret pointer parm.
I have questions.
The itanium ABI requires functions that return a large struct this way,
to *also* return that pointer as their scalar return value. (Let's not
get into the pros and cons of that, it is what it is. I'm looking at
x86_64 primarily, but I understand ISAs have similar ABIs.)
Anyway, to do that requires some data-flow work, and being a newbie to
llvm IR, I can see two ways to do this. It is not clear to me which is
the easiest or best. Plus I find discrepancies between documentation,
tests and implementation!
Consider:
struct Big { int ary[50]; Big (); };
Big Foo ();
Big Bar () { return Foo (); }
Here's the IR:
define dso_local void @_Z3Barv(%struct.Big* noalias sret(%struct.Big)
align 4 %0) local_unnamed_addr #0 {
tail call void @_Z3Foov(%struct.Big* sret(%struct.Big) align 4 %0)
ret void
}
I.e. the middle end figures this is tail callable, but I don't think it
knows about the pointer return requirement (see below for evidence).
Test/documentation mismatch: The tailcall documentation says:
(https://llvm.org/docs/LangRef.html#call-instruction)
'Both markers imply that the callee does not access allocas
from the caller.'
However, the X86 sibcall test (llvm/test/CodeGen/X86/sibcall.ll) seems
to break that. Specifically:
define fastcc void @t21_sret_to_sret_alloca(%struct.foo* noalias
sret(%struct.foo) %agg.result) nounwind {
%a = alloca %struct.foo, align 8
tail call fastcc void @t21_f_sret(%struct.foo* noalias
sret(%struct.foo) %a) nounwind
ret void
}
That call to t21_f_sret is referencing the frame-allocated %a object.
Question: Is sibcall.ll correct or not?
Implementation/documentation mismatch: I also note that the tail marker
can appear even when the call is NOT the last (real) instruction in the
function. That seems strange.
The documentation says:
'The optional tail and musttail markers indicate that the
optimizers should perform tail call optimization.'
Consider:
struct Big { int ary[50]; Big (); };
void Frob ();
Big Baz () { Big b; Frob (); return b; }
this generates:
define dso_local void @_Z3Bazv(%struct.Big* noalias nonnull
sret(%struct.Big) align 4 %0) local_unnamed_addr #0 {
tail call void @_ZN3BigC1Ev(%struct.Big* nonnull dereferenceable(200) %0)
tail call void @_Z4Frobv()
ret void
}
We can tail call Frob, but not Big's constructor. Why is the ctor
marked as tailcallable?
[as an aside, if the middle end knew about the sret pointer return
requirement, it wouldn't have marked Frob as tailcallable, right?]
Question: should the ctor not be marked tail call, or should the
documentation be adjusted to at least mention this behaviour?
Anyway, the backend code-generator checks additional constraints before
performing the tailcall.
a) Should the x86 backend track where it assigned the incomming sret
pointer and see if that's being passed to the tail call? (I've not
figured out how to do that yet).
b) or should the middle end annotate that tail call as passing the
incoming sret? (metadata? new marker? something else?) This would seem
to avoid having to implement #a for each backend that has this requirement.
Question: any insights as to whether #a or #b is the better direction?
nathan
--
Nathan Sidwell
More information about the llvm-dev
mailing list