[PATCH] D105807: [X86] pr51000 struct return tailcalling

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 13 11:28:20 PDT 2021


efriedma added inline comments.


================
Comment at: llvm/test/CodeGen/X86/sibcall.ll:854
   %b = call fastcc %struct.foo* @ret_struct()
   tail call fastcc void @t21_f_sret2(%struct.foo* noalias sret(%struct.foo) %a, %struct.foo* noalias %b) nounwind
   ret void
----------------
urnathan wrote:
> efriedma wrote:
> > Oh, now I remember what happened the last time someone tried to fix this... t21_sret_to_sret_structs_mismatch() is also illegal to transform, for similar reasons to t21_sret_to_non_sret().
> Ah, I think I understand now -- I was thinking this would be UB, but I can see how it might be legit.  Let me walk though my understanding.
>  
> `define fastcc void @t21_sret_to_sret_structs_mismatch(%struct.foo* noalias sret(%struct.foo) %agg.result, %struct.foo* noalias %a) nounwind  {`
>  we receive an sret as %agg.result and a explicit pointer in %a (to uninitialized memory? or is that too C++-specific?). 
> 
> ` %b = call fastcc %struct.foo* @ret_struct()`
> this gives us a pointer to a foo in %b [aside, this seems superfluous to the xform being tested?]
> 
> `  tail call fastcc void @t21_f_sret2(%struct.foo* noalias sret(%struct.foo) %a, %struct.foo* noalias %b) nounwind`
> We call t21_f_sret2 passing it %a as its incoming sret pointer and %b as an explicit arg.  It'll return %a, not our incoming sret.  
> 
> I thought this would be UB, because we're not initializing our own return value -- in C++ we should be calling a constructor.  but (a) we could have inlined that and (b) it could be trivial and do nothing.  That was my logical error.
> 
> So, I think 
> a) if we're not an sret function we can tail call[*],
> b) else if we're calling a non-sret function we cannot tail call,
> c) else if we're passing our sret register as the callee's sret register we can tail call[*],
> d) else we cannot.
> [*] provided the remaining checks are ok
That set of rules looks right.

For the testcase, a C equivalent would be something like this:

```
struct A { int x[10]; };
__attribute((noinline)) struct A t21_f_sret2(struct A *__restrict x) {
  return (struct A){0};
}
struct A t21_sret_to_sret_structs_mismatch(struct A a[__restrict static 1]) {
  struct A r = {};
  *a = t21_f_sret2(0);
  return r;
}
```

If you run this through clang -O2, you get IR almost identical to this testcase.  (The result is missing a "tail" marker, but that's just a missed optimization; we're missing a run of tailcallelim in the normal pass pipeline.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105807/new/

https://reviews.llvm.org/D105807



More information about the llvm-commits mailing list