[PATCH] D108834: [X86] [pr51000] sret pointer tailcalling

Mon Aug 30 12:36:24 PDT 2021

rnk added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:4884
+
+    if (SRetParm != OutVals[0])
+      // We're not passing on our incoming sret pointer.
----------------
I don't think this will work in general, because IIRC all SDNode objects are destroyed and reused after selecting instructions in each basic block. You can use an ASan build with a multi-BB test case if you want to really confirm this theory.

I think you want to use the virtual register of the argument and compare that.

My memory is that the virtual SRet register doesn't compare equal because we insert an extra COPY instruction in the prologue:
```
bb.0.entry:
  liveins: $rdi, $rsi
  %1:gr64 = COPY $rsi
  %0:gr64 = COPY $rdi
  %2:gr64 = COPY %0:gr64  # EXTRA
...
  $rdi = COPY %0:gr64
  $rsi = COPY %1:gr64
  CALL64pcrel32 @_Z4copyR3Foo, ... , implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit-def $rsp, implicit-def $ssp
...
  $rax = COPY %2:gr64
  RET 0, $rax
```

So, the virtual registers being compared are `%0` and `%2`.

I believe the extra copy is created because SelectionDAGBuilder.cpp is responsible for making the "standard" argument virtual register copies, and that happens later, outside `X86TargetLowering::LowerFormalArguments`. It is also only done if the argument is used outside the entry block.

---

So, I'm wondering if we can do a big refactoring. Maybe the thing to do is to change SDISel to save the sret argument virtual register when it updates the ValueMap, and have it do it for all targets.  We will have to pretend that sret arguments are always used outside the current BB, since they are on many targets.

The target will decide later if it needs to insert an extra COPY from this virtual register when lowering returns. This should result in less duplicate code overall, and shorter machine IR in most cases. Interested in attempting that approach?

================
Comment at: llvm/test/CodeGen/X86/sibcall.ll:454
 ; X86-LABEL: t15:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %esi
----------------
This seems like it should optimize after your change. Is there a good reason it can't?

================
Comment at: llvm/test/CodeGen/X86/sibcall.ll:701

-
 define fastcc void @t21_sret_to_sret_args_mismatch(%struct.foo* noalias sret(%struct.foo) %agg.result, %struct.foo* noalias %ret) nounwind  {
 ; X86-LABEL: t21_sret_to_sret_args_mismatch:
----------------
Looks like we already have good test cases for when TCO should fail due to sret arg mismatch.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108834/new/

https://reviews.llvm.org/D108834