[PATCH] D80754: AMDGPU/GlobalISel: cmp/select method for insert element

Wed Jun 3 18:13:02 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1903-1904
+
+  if (getOpcodeDef(TargetOpcode::G_CONSTANT, Idx, MRI))
+    return false;
+
----------------
This can ignore this? If it had a constant index the legalizer would have dealt with it already. It's also not wrong to do the rest for a constant

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1937-1938
+
+  if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank &&
+      Subtarget.getConstantBusLimit(AMDGPU::V_CMP_EQ_U32_e32) < 2) {
+    // Legalize compare to avoid constant bus restriction.
----------------
RegBankSelect should never need to consider the constant bus restriction, see the long comment at the top of the file. Any vector operation should only use VGPR operands

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll:32
 ; MOVREL:       ; %bb.0: ; %entry
-; MOVREL-NEXT:    s_mov_b32 s0, s2
-; MOVREL-NEXT:    s_mov_b32 m0, s11
-; MOVREL-NEXT:    s_mov_b32 s1, s3
-; MOVREL-NEXT:    s_mov_b32 s2, s4
-; MOVREL-NEXT:    s_mov_b32 s3, s5
-; MOVREL-NEXT:    s_mov_b32 s4, s6
-; MOVREL-NEXT:    s_mov_b32 s5, s7
-; MOVREL-NEXT:    s_mov_b32 s6, s8
-; MOVREL-NEXT:    s_mov_b32 s7, s9
-; MOVREL-NEXT:    s_movreld_b32 s0, s10
+; MOVREL-NEXT:    v_cmp_eq_u32_e64 s0, s11, 0
+; MOVREL-NEXT:    v_cmp_eq_u32_e64 s1, s11, 1
----------------
rampitec wrote:
> rampitec wrote:
> > That is an incorrect selection of G_ICMP with wave32. AMDGPUInstructionSelector::isVCC() does this:
> > 
> > ```
> >   if (RC) {
> >     const LLT Ty = MRI.getType(Reg);
> >     return RC->hasSuperClassEq(TRI.getBoolRC()) &&
> >            Ty.isValid() && Ty.getSizeInBits() == 1;
> >   }
> > ```
> > 
> > Since SGPR_32 is used for condition hasSuperClassEq() returns true, and even though we have Ty == s1 it still considers it a VCC.
> This can be fixed by using LLT::scalar(32) instead of LLT::scalar(1).
Yes, see the long comment at the top of the file. Scalar compares need to always produce s32, s1 is assumed VCC.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll:58
 ; GPRIDX:       ; %bb.0: ; %entry
-; GPRIDX-NEXT:    s_mov_b32 s0, s2
-; GPRIDX-NEXT:    s_mov_b32 s1, s3
-; GPRIDX-NEXT:    s_mov_b32 s2, s4
-; GPRIDX-NEXT:    s_mov_b32 s3, s5
-; GPRIDX-NEXT:    s_mov_b32 s4, s6
-; GPRIDX-NEXT:    s_mov_b32 s5, s7
-; GPRIDX-NEXT:    s_mov_b32 s6, s8
-; GPRIDX-NEXT:    s_mov_b32 s7, s9
-; GPRIDX-NEXT:    s_mov_b32 m0, s11
-; GPRIDX-NEXT:    s_nop 0
-; GPRIDX-NEXT:    s_movreld_b32 s0, s10
+; GPRIDX-NEXT:    s_cmp_eq_u32 s11, 0
+; GPRIDX-NEXT:    s_cselect_b32 s0, s10, s2
----------------
I think the register indexing looks better if the index is uniform

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80754/new/

https://reviews.llvm.org/D80754