[llvm] [AMDGPU] Allocate i1 argument to SGPRs (PR #72461)

Jun Wang via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 12 15:00:09 PST 2024


================
@@ -715,6 +715,14 @@ bool SILowerI1Copies::lowerCopiesToI1() {
       assert(!MI.getOperand(1).getSubReg());
 
       if (!SrcReg.isVirtual() || (!isLaneMaskReg(SrcReg) && !isVreg1(SrcReg))) {
+        if (!SrcReg.isVirtual() &&
+            TII->getRegisterInfo().getRegSizeInBits(SrcReg, *MRI) == 64) {
+          // When calling convention allocates SGPR for i1, for GPUs with
+          // wavefront size 64, i1 return value is put in 64b SGPR.
+          assert(ST->isWave64());
+          continue;
+        }
+
----------------
jwanggit86 wrote:

The machine code when reaching SILowerI1Copies is as follows:
```
# Machine code for function test_call_external_i1_func_void: IsSSA, TracksLiveness

bb.0 (%ir-block.0):
  ADJCALLSTACKUP 0, 0, implicit-def dead $scc
  %0:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-gotprel32-lo) @i1_func_void, target-flags(amdgpu-gotprel32-hi) @i1_func_void, implicit-def dead $scc
  %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM killed %0:sreg_64, 0, 0 :: (dereferenceable invariant load (s64) from got, addrspace 4)
  %2:sgpr_128 = COPY $sgpr0_sgpr1_sgpr2_sgpr3
  $sgpr0_sgpr1_sgpr2_sgpr3 = COPY %2:sgpr_128
  SI_CALL_ISEL killed %1:sreg_64_xexec, @i1_func_void, <regmask $sgpr_null $sgpr_null_hi $src_private_base $src_private_base_hi $src_private_base_lo $src_private_limit $src_private_limit_hi $src_private_limit_lo $src_shared_base $src_shared_base_hi $src_shared_base_lo $src_shared_limit $src_shared_limit_hi $src_shared_limit_lo $sgpr30 $sgpr31 $sgpr32 $sgpr33 $sgpr34 $sgpr35 $sgpr36 $sgpr37 $sgpr38 $sgpr39 $sgpr40 $sgpr41 $sgpr42 $sgpr43 $sgpr44 $sgpr45 $sgpr46 $sgpr47 $sgpr48 and 1194 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit-def $sgpr0_sgpr1
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc
  %3:vreg_1 = COPY $sgpr0_sgpr1
  %5:sreg_64_xexec = COPY %3:vreg_1
  %4:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %5:sreg_64_xexec, implicit $exec
  %6:sreg_64 = IMPLICIT_DEF
  %7:vreg_64 = COPY %6:sreg_64
  GLOBAL_STORE_BYTE killed %7:vreg_64, killed %4:vgpr_32, 0, 0, implicit $exec :: (volatile store (s8) into `ptr addrspace(1) undef`, addrspace 1)
  SI_RETURN
```
In the COPY of $sgpr0_sgpr1 to %3, the dest reg is vreg_1, and the source is a 64b phys reg. The above code change is to skip this instruction in order to avoid triggering the assert which ensures the src is 32b.

https://github.com/llvm/llvm-project/pull/72461


More information about the llvm-commits mailing list