[PATCH] D94203: GlobalISel: Add combine for G_UREM by power of 2

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 7 03:16:28 PST 2021


foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

LGTM.



================
Comment at: llvm/include/llvm/Target/GlobalISel/Combine.td:570
 def known_bits_simplifications : GICombineGroup<[
-  redundant_and, redundant_sext_inreg, redundant_or]>;
+  redundant_and, redundant_sext_inreg, redundant_or, urem_pow2_to_mask]>;
 
----------------
Seems a bit questionable to call this a "known bits simplification", but I don't suppose it matters.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/combine-urem-pow-2.mir:34
+    ; GCN: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 -1
+    ; GCN: [[ADD:%[0-9]+]]:_(s32) = G_ADD %const, [[C]]
+    ; GCN: $vgpr0 = COPY [[ADD]](s32)
----------------
Isn't there a constant folding combine that should simplify this further?


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll:210
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    s_movk_i32 s4, 0x1000
-; CHECK-NEXT:    v_mov_b32_e32 v1, 0xfffff000
-; CHECK-NEXT:    v_cvt_f32_u32_e32 v2, s4
-; CHECK-NEXT:    v_rcp_iflag_f32_e32 v2, v2
-; CHECK-NEXT:    v_mul_f32_e32 v2, 0x4f7ffffe, v2
-; CHECK-NEXT:    v_cvt_u32_f32_e32 v2, v2
-; CHECK-NEXT:    v_mul_lo_u32 v1, v1, v2
-; CHECK-NEXT:    v_mul_hi_u32 v1, v2, v1
-; CHECK-NEXT:    v_add_i32_e32 v1, vcc, v2, v1
-; CHECK-NEXT:    v_mul_hi_u32 v1, v0, v1
-; CHECK-NEXT:    v_lshlrev_b32_e32 v1, 12, v1
-; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v0, v1
-; CHECK-NEXT:    v_subrev_i32_e32 v1, vcc, s4, v0
-; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc
-; CHECK-NEXT:    v_subrev_i32_e32 v1, vcc, s4, v0
-; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc
+; CHECK-NEXT:    s_add_i32 s4, 0x1000, -1
+; CHECK-NEXT:    v_and_b32_e32 v0, s4, v0
----------------
Maybe this is the same question as above, but: why doesn't this get constant folded to an s_mov ?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94203/new/

https://reviews.llvm.org/D94203



More information about the llvm-commits mailing list