[PATCH] D108834: [X86] [pr51000] sret pointer tailcalling

Nathan Sidwell via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 3 04:29:35 PDT 2021


urnathan added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:4884
+
+    if (SRetParm != OutVals[0])
+      // We're not passing on our incoming sret pointer.
----------------
rnk wrote:
> I don't think this will work in general, because IIRC all SDNode objects are destroyed and reused after selecting instructions in each basic block. You can use an ASan build with a multi-BB test case if you want to really confirm this theory.
> 
> I think you want to use the virtual register of the argument and compare that.
> 
> My memory is that the virtual SRet register doesn't compare equal because we insert an extra COPY instruction in the prologue:
> ```
> bb.0.entry:
>   liveins: $rdi, $rsi
>   %1:gr64 = COPY $rsi
>   %0:gr64 = COPY $rdi
>   %2:gr64 = COPY %0:gr64  # EXTRA
> ...
>   $rdi = COPY %0:gr64
>   $rsi = COPY %1:gr64
>   CALL64pcrel32 @_Z4copyR3Foo, ... , implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit-def $rsp, implicit-def $ssp
> ...
>   $rax = COPY %2:gr64
>   RET 0, $rax
> ```
> 
> So, the virtual registers being compared are `%0` and `%2`.
> 
> I believe the extra copy is created because SelectionDAGBuilder.cpp is responsible for making the "standard" argument virtual register copies, and that happens later, outside `X86TargetLowering::LowerFormalArguments`. It is also only done if the argument is used outside the entry block.
> 
> ---
> 
> So, I'm wondering if we can do a big refactoring. Maybe the thing to do is to change SDISel to save the sret argument virtual register when it updates the ValueMap, and have it do it for all targets.  We will have to pretend that sret arguments are always used outside the current BB, since they are on many targets.
> 
> The target will decide later if it needs to insert an extra COPY from this virtual register when lowering returns. This should result in less duplicate code overall, and shorter machine IR in most cases. Interested in attempting that approach?
Ah, it seems the single-bb nature of the test cases misled me about sdnode lifetime.
  
>The target will decide later if it needs to insert an extra COPY from this virtual
> register when lowering returns. This should result in less duplicate code overall,
> and shorter machine IR in most cases. Interested in attempting that approach?

right, that was what I made an attempt at to avoid needing the virtual reg /and/ the sdnode in the machine function structure.  As I think I mentioned, that failed on fastisel due to sdnode lifetime -- and I thought that was some quirk of fastisel, not the bb thing you mention.

I need to understand more about the isel dag

thank you for your comments


================
Comment at: llvm/test/CodeGen/X86/sibcall.ll:454
 ; X86-LABEL: t15:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %esi
----------------
rnk wrote:
> This seems like it should optimize after your change. Is there a good reason it can't?
IIRC it was to do with the caller and callee arg popping mismatch (which looked to be an incorrect determination to my cursory investigation).  I wanted to make progress though, rather than let perfection get in the way.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108834/new/

https://reviews.llvm.org/D108834



More information about the llvm-commits mailing list