[PATCH] D76500: GlobalISel: Lower funnel shifts

Mon Mar 22 04:37:09 PDT 2021

foad added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/Utils.h:328-330
+/// Attempt to match a unary predicate against a scalar/splat constant or every
+/// element of a constant G_BUILD_VECTOR/G_BUILD_VECTOR_TRUNC. If AllowUndef is
+/// true, then G_IMPLICIT_DEF elements will pass nullptr to Match.
----------------
Nit: "AllowUndef**s**"

Comment about nullptr seems wrong. Match doesn't take any pointer arguments.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:5133-5134
+  } else {
+    // fshl X, Y, Z -> fshr (srl X, 1), (fshr X, Y, 1), ~Z
+    // fshr X, Y, Z -> fshl (fshl X, Y, 1), (shl Y, 1), ~Z
+    auto One = MIRBuilder.buildConstant(ShTy, 1);
----------------
Using `~Z` here only works if BitWidth is a power of two. Otherwise you need something like `BitWidth - 1 - (Z % BitWidth)`.

================
Comment at: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:4975
+        const ConstantInt *C = getConstantIntVRegVal(R, MRI);
+        return !C || C->getValue().urem(BW) != 0;
+      },
----------------
foad wrote:
> Should be `C && C->getValue().urem(BW) != 0`.
I still think this check for `!C` is either wrong or redundant, depending on what the behaviour of matchUnaryPredicate is supposed to be.

================
Comment at: llvm/lib/CodeGen/GlobalISel/Utils.cpp:878
+
+    if (!Match(MRI, SrcElt))
+      return false;
----------------
At the top level, matchUnaryPredicate only calls Match for G_CONSTANTs (not for G_IMPLICIT_DEF despite what the comment in Utils.h says).

Inside a G_BUILD_VECTOR(_TRUNC) you call Match on anything. Did you mean to have a check for G_CONSTANT here too, like the SelectionDAG version does?

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fshl.mir:48
+    %2:_(s32) = COPY $vgpr2
+    %3:_(s32) = G_FSHL %0, %1, %2
+    $vgpr0 = COPY %3
----------------
Not related to this patch, but I think it would work out better for AMDGPU to implement 32-bit G_FSHL as a 64-bit left shift and then extract the high part of the result.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76500/new/

https://reviews.llvm.org/D76500