[PATCH] D80754: AMDGPU/GlobalISel: cmp/select method for insert element
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 3 18:13:02 PDT 2020
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1903-1904
+
+ if (getOpcodeDef(TargetOpcode::G_CONSTANT, Idx, MRI))
+ return false;
+
----------------
This can ignore this? If it had a constant index the legalizer would have dealt with it already. It's also not wrong to do the rest for a constant
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1937-1938
+
+ if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank &&
+ Subtarget.getConstantBusLimit(AMDGPU::V_CMP_EQ_U32_e32) < 2) {
+ // Legalize compare to avoid constant bus restriction.
----------------
RegBankSelect should never need to consider the constant bus restriction, see the long comment at the top of the file. Any vector operation should only use VGPR operands
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll:32
; MOVREL: ; %bb.0: ; %entry
-; MOVREL-NEXT: s_mov_b32 s0, s2
-; MOVREL-NEXT: s_mov_b32 m0, s11
-; MOVREL-NEXT: s_mov_b32 s1, s3
-; MOVREL-NEXT: s_mov_b32 s2, s4
-; MOVREL-NEXT: s_mov_b32 s3, s5
-; MOVREL-NEXT: s_mov_b32 s4, s6
-; MOVREL-NEXT: s_mov_b32 s5, s7
-; MOVREL-NEXT: s_mov_b32 s6, s8
-; MOVREL-NEXT: s_mov_b32 s7, s9
-; MOVREL-NEXT: s_movreld_b32 s0, s10
+; MOVREL-NEXT: v_cmp_eq_u32_e64 s0, s11, 0
+; MOVREL-NEXT: v_cmp_eq_u32_e64 s1, s11, 1
----------------
rampitec wrote:
> rampitec wrote:
> > That is an incorrect selection of G_ICMP with wave32. AMDGPUInstructionSelector::isVCC() does this:
> >
> > ```
> > if (RC) {
> > const LLT Ty = MRI.getType(Reg);
> > return RC->hasSuperClassEq(TRI.getBoolRC()) &&
> > Ty.isValid() && Ty.getSizeInBits() == 1;
> > }
> > ```
> >
> > Since SGPR_32 is used for condition hasSuperClassEq() returns true, and even though we have Ty == s1 it still considers it a VCC.
> This can be fixed by using LLT::scalar(32) instead of LLT::scalar(1).
Yes, see the long comment at the top of the file. Scalar compares need to always produce s32, s1 is assumed VCC.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll:58
; GPRIDX: ; %bb.0: ; %entry
-; GPRIDX-NEXT: s_mov_b32 s0, s2
-; GPRIDX-NEXT: s_mov_b32 s1, s3
-; GPRIDX-NEXT: s_mov_b32 s2, s4
-; GPRIDX-NEXT: s_mov_b32 s3, s5
-; GPRIDX-NEXT: s_mov_b32 s4, s6
-; GPRIDX-NEXT: s_mov_b32 s5, s7
-; GPRIDX-NEXT: s_mov_b32 s6, s8
-; GPRIDX-NEXT: s_mov_b32 s7, s9
-; GPRIDX-NEXT: s_mov_b32 m0, s11
-; GPRIDX-NEXT: s_nop 0
-; GPRIDX-NEXT: s_movreld_b32 s0, s10
+; GPRIDX-NEXT: s_cmp_eq_u32 s11, 0
+; GPRIDX-NEXT: s_cselect_b32 s0, s10, s2
----------------
I think the register indexing looks better if the index is uniform
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80754/new/
https://reviews.llvm.org/D80754
More information about the llvm-commits
mailing list