[PATCH] D118461: [AMDGPU] Introduce new ISel combine for trunc-slr patterns

Fri Jan 28 05:32:50 PST 2022

tsymalla created this revision.
Herald added subscribers: foad, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
tsymalla requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

In some cases, when selecting a (trunc (slr)) pattern, the slr gets translated
to a v_lshrrev_b3e2_e64 instruction whereas the truncation gets selected to
a sequence of v_and_b32_e64 and v_cmp_eq_u32_e64. In the final ISA, this appears
as selecting the nth-bit:

v_lshrrev_b32_e32 v0, 2, v1
v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 vcc_lo, 1, v0

However, when the value used in the right shift is known at compilation time, the
whole sequence can be reduced to two VALUs when the constant operand in the v_and
and the v_cmp_eq is adjusted to (1 << lshrrev_operand):

v_and_b32_e32 v0, (1 << 2), v1
v_cmp_eq_u32_e32 vcc_lo, (1 << 2), v0

In the example above, the following pseudo-code:

v0 = (v1 >> 2)
v0 = v0 & 1
vcc_lo = (v0 == 1)

would be translated to:

v0 = v1 & 0b100
vcc_lo = (v0 == 0b100)

which should yield an equivalent result.
This is a little bit hard to test as one needs to force the SelectionDAG to
contain the nodes before instruction selection, but the test sequence was
roughly derived from a production shader.

To prevent additional VGPR pressure by using the bitshift, this pattern only
takes part when the constant inside the lshr is < 16 as it could be observed
that for (1 << 16) an additional VGPR was used to store the constant value.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D118461

Files:
  llvm/lib/Target/AMDGPU/SIInstructions.td
  llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll


Index: llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll
===================================================================

--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll
@@ -0,0 +1,20 @@
+; RUN: llc -march=amdgcn -mtriple=amdgcn-- -stop-after=amdgpu-isel -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
+
+; GCN-LABEL: bb.0.entry:
+; GCN-NOT:      V_LSHRREV_B32_e64
+; GCN:          V_AND_B32_e64 2
+; GCN:          V_CMP_EQ_U32_e64 killed {{.*}}, 2
+define i32 @opt_lshr_and_cmp(i32 %x) {
+entry:
+  %0 = and i32 %x, 2
+  %1 = icmp eq i32 %0, 0
+  %2 = xor i1 %1, -1
+  br i1 %2, label %out.true, label %out.else
+
+out.true:
+  %3 = shl i32 %x, 2
+  ret i32 %3
+
+out.else:
+  ret i32 %x
+}
Index: llvm/lib/Target/AMDGPU/SIInstructions.td
===================================================================
--- llvm/lib/Target/AMDGPU/SIInstructions.td
+++ llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2269,6 +2269,31 @@
   (V_CMP_EQ_U32_e64 (V_AND_B32_e64 (i32 1), $a), (i32 1))
 >;
 
+// Restrict the range to prevent using an additional VGPR
+// for the shifted value.
+def IMMBitSelRange : ImmLeaf <i32, [{
+  return Imm > 0 && Imm < 16;
+}]>;
+
+def IMMBitSelConst : SDNodeXForm<imm, [{
+  return CurDAG->getTargetConstant((1 << N->getZExtValue()), SDLoc(N),
+                                   MVT::i32);
+}]>;
+
+// Matching separate SRL and TRUNC instructions
+// with dependent operands (SRL dest is source of TRUNC)
+// generates three instructions. However, by using bit shifts,
+// the V_LSHRREV_B32_e64 result can be directly used in the
+// operand of the V_AND_B32_e64 instruction:
+// (trunc i32 (srl i32 $a, i32 $b)) ->
+// v_and_b32_e64 $a, (1 << $b), $a
+// v_cmp_eq_u32_e64 $a, (1 << $b), $a 
+def : GCNPat <
+  (i1 (trunc (i32 (srl i32:$a, IMMBitSelRange:$b)))),
+  (V_CMP_EQ_U32_e64 (V_AND_B32_e64 (i32 (IMMBitSelConst $b)), $a),
+    (i32 (IMMBitSelConst $b)))
+>;
+
 def : GCNPat <
   (i1 (DivergentUnaryFrag<trunc> i64:$a)),
   (V_CMP_EQ_U32_e64 (V_AND_B32_e64 (i32 1),


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D118461.403982.patch
Type: text/x-patch
Size: 2035 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220128/98163474/attachment.bin>