[llvm] [AArch64] Fold COPY(y:gpr, DUP(x:fpr, i)) -> UMOV(y:gpr, x:fpr, i) (PR #89017)

Dhruv Chawla via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 29 04:10:21 PDT 2024


dc03-work wrote:

> I'm pragmatic about incremental solutions, but first have you looked at examining users of instructions in RBS? Even if it works top down there's nothing stopping you from peeking at uses.

Hmm, so I did more digging into this - I don't think it is possible to do this by examining users at all, because there is no good way to predict if a user will require GPR or not. For example, if the user is a `G_ADD`, there's no way to predict that the `G_ADD` will definitely be a GPR add.

There are instructions that only accept GPR, however that would limit this fold too much. The only way I can see for implementing this is if the user is selected before the definition...

Anyways, I also tried moving this to AArch64PostSelectOptimize. While it does fold the way it is doing for the current test cases, it also causes regressions in other tests somehow, where copy-folding appears to be occuring (`copy fpr -> gpr` then `copy gpr -> fpr`) that gets inhibited by folding the copy in AArch64PostSelectOptimize.

I think fixing the issue in RBS may also cause these issues to crop up! As an example, consider the following diff:
```diff
diff --git a/llvm/test/CodeGen/AArch64/arm64-dup.ll b/llvm/test/CodeGen/AArch64/arm64-dup.ll
index 2bf5419e54..32f88588b9 100644
--- a/llvm/test/CodeGen/AArch64/arm64-dup.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-dup.ll
@@ -373,7 +373,8 @@ define <4 x i16> @test_build_illegal(<4 x i32> %in) {
 ;
 ; CHECK-GI-LABEL: test_build_illegal:
 ; CHECK-GI:       // %bb.0:
-; CHECK-GI-NEXT:    mov s0, v0[3]
+; CHECK-GI-NEXT:    mov.s w8, v0[3]
+; CHECK-GI-NEXT:    fmov s0, w8
 ; CHECK-GI-NEXT:    mov.h v0[3], v0[0]
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 killed $q0
 ; CHECK-GI-NEXT:    ret
```
The code before comes from the following IR:
```mir
  # Machine code for function test_build_illegal: IsSSA, TracksLiveness, Legalized, RegBankSelected, Selected             
  Function Live Ins: $q0                                                                                                  
                                                                                                                          
  bb.1 (%ir-block.0):                                                                                                     
    liveins: $q0                                                                                                          
    %0:fpr128 = COPY $q0                                                                                                  
    %1:fpr32 = DUPi32 %0:fpr128, 3                                                                                        
    %7:gpr32 = COPY %1:fpr32                                                                                              
    %6:gpr32 = IMPLICIT_DEF                                                                                               
    %20:fpr32 = COPY %6:gpr32                                                                                             
    %21:fpr16 = COPY %20.hsub:fpr32                                                                                       
    %18:fpr32 = COPY %7:gpr32                                                                                             
    %19:fpr16 = COPY %18.hsub:fpr32                                                                                       
    %12:fpr128 = IMPLICIT_DEF                                                                                             
    %13:fpr128 = INSERT_SUBREG %12:fpr128(tied-def 0), %21:fpr16, %subreg.hsub                                            
    %15:fpr128 = IMPLICIT_DEF                                                                                             
    %16:fpr128 = INSERT_SUBREG %15:fpr128(tied-def 0), %19:fpr16, %subreg.hsub                                            
    %14:fpr128 = INSvi16lane %13:fpr128(tied-def 0), 3, %16:fpr128, 0                                                     
    %4:fpr64 = COPY %14.dsub:fpr128                                                                                       
    $d0 = COPY %4:fpr64                                                                                                   
    RET_ReallyLR implicit $d0                                                                                             
                                                                                                                          
  # End machine code for function test_build_illegal.
```
After the MIR peephole optimizer runs, `%7` gets folded into `%18` and then gets deleted. When `%7` is folded before this pass runs, `%1` gets deleted and a cross regbank copy gets emitted. 

At this point, I'm starting to feel like this pass really is the best place for this fold, if there were a good way to gate this fold to only run when GlobalISel is being used.

https://github.com/llvm/llvm-project/pull/89017


More information about the llvm-commits mailing list