[PATCH] D98515: [AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks

Mirko Brkusanin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 1 05:30:00 PDT 2021


mbrkusanin added a comment.

In D98515#2661286 <https://reviews.llvm.org/D98515#2661286>, @arsenm wrote:

> In D98515#2660982 <https://reviews.llvm.org/D98515#2660982>, @mbrkusanin wrote:
>
>> %1:vgpr(s32) = COPY $sgpr0
>
> I wonder if we should ban this in the verifier. It's not wrong, but it sure feels like bad form to allow cross bank copies involving physical registers

I guess that should be a separate patch and not related to this?



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:2023
       for (unsigned N : { 0, 2, 3 })
         MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
 
----------------
*this overwrites the bank for operand 2 which is the destination of G_AMDGPU_S_BUFFER_LOAD.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:2018
+    if (DstBank == AMDGPU::VGPRRegBank && InsBank == AMDGPU::SGPRRegBank)
+      InReg = B.buildCopy(MRI.getType(InReg), InReg).getReg(0);
+    InsRegs.push_back(InReg);
----------------
arsenm wrote:
> I don't understand how this came to change the bank of an already assigned register. The bank should already be correct here? Why isn't the problem earlier in the initial bank selection?
Can you clarify what you mean by this?
The register in issue is %4. Below is the diff for state before and after foldInsertEltToCmpSelect for the new test. G_INSERT_VECTOR_ELT is expanded but the function in question also affects destination of G_AMDGPU_S_BUFFER_LOAD and that is what I'm trying to avoid changing. I don't believe that the issue was somehow in selecting sgpr for %4 but rather in line 2023* when we overwrite the bank for it.

```
   %1:sgpr(<2 x s32>) = COPY $sgpr4_sgpr5
   %2:vgpr(s32) = COPY $vgpr0
   %3:sgpr(s32) = G_CONSTANT i32 0
-  %4:sgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4)
+  %4:vgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4)
   %6:vgpr(<2 x s32>) = COPY %1:sgpr(<2 x s32>)
-  %5:vgpr(<2 x s32>) = G_INSERT_VECTOR_ELT %6:vgpr, %4:sgpr(s32), %2:vgpr(s32)
+  %7:vgpr(s32), %8:vgpr(s32) = G_UNMERGE_VALUES %6:vgpr(<2 x s32>)
+  %9:sgpr(s32) = G_CONSTANT i32 0
+  %10:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %9:sgpr
+  %11:vgpr(s32) = G_SELECT %10:vcc(s1), %4:vgpr, %7:vgpr
+  %12:sgpr(s32) = G_CONSTANT i32 1
+  %13:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %12:sgpr
+  %14:vgpr(s32) = G_SELECT %13:vcc(s1), %4:vgpr, %8:vgpr
+  %5:vgpr(<2 x s32>) = G_BUILD_VECTOR %11:vgpr(s32), %14:vgpr(s32)
   S_ENDPGM 0, implicit %5:vgpr(<2 x s32>)
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98515/new/

https://reviews.llvm.org/D98515



More information about the llvm-commits mailing list