[llvm] [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats (PR #65706)

Tue Sep 12 12:32:04 PDT 2023

================
@@ -729,6 +730,99 @@ bool VectorCombine::foldBitcastShuf(Instruction &I) {
   return true;
 }
 
+/// VP Intrinsics whose vector operands are both splat values may be simplified
+/// into the scalar version of the operation and the result is splatted. This
+/// can lead to scalarization down the line.
+bool VectorCombine::scalarizeVPIntrinsic(VPIntrinsic &VPI) {
+  Value *Op0 = VPI.getArgOperand(0);
+  Value *Op1 = VPI.getArgOperand(1);
+
+  if (!isSplatValue(Op0) || !isSplatValue(Op1))
+    return false;
+
+  // For the binary VP intrinsics supported here, the result on disabled lanes
+  // is a poison value. For now, only do this simplification if all lanes
+  // are active.
+  // TODO: Relax the condition that all lanes are active by using insertelement
+  // on inactive lanes.
+  auto IsAllTrueMask = [](Value *MaskVal) {
+    if (Value *SplattedVal = getSplatValue(MaskVal))
+      if (auto *ConstValue = dyn_cast<Constant>(SplattedVal))
+        return ConstValue->isAllOnesValue();
+    return false;
+  };
+  if (!IsAllTrueMask(VPI.getArgOperand(2)))
+    return false;
+
+  // Check to make sure we support scalarization of the intrinsic
+  std::set<Intrinsic::ID> SupportedIntrinsics(
+      {Intrinsic::vp_add, Intrinsic::vp_sub, Intrinsic::vp_mul,
+       Intrinsic::vp_ashr, Intrinsic::vp_lshr, Intrinsic::vp_shl,
+       Intrinsic::vp_or, Intrinsic::vp_and, Intrinsic::vp_xor,
+       Intrinsic::vp_fadd, Intrinsic::vp_fsub, Intrinsic::vp_fmul,
+       Intrinsic::vp_sdiv, Intrinsic::vp_udiv, Intrinsic::vp_srem,
+       Intrinsic::vp_urem});
+  Intrinsic::ID IntrID = VPI.getIntrinsicID();
+  if (!SupportedIntrinsics.count(IntrID))
+    return false;
----------------
michaelmaitland wrote:

I've made this change, which turns out to be less of a simplification and more of an improvement on how many vp intrinsics we can scalarize, since more vp intrinsics have VP_PROPERTY_BINARYOP than I originally supported.

The new vp intrinsics that can be vectorized have scalar counterparts which are intrinsics and not BinOp Instrunctions. This leads to the introduction of some additional logic to find the scalar version of the intrinsic, find the cost of the scalar intrinsic, and creating the new scalar intrinsic.

Since this change leads to additional kinds of optimization, I think that is warranted.

https://github.com/llvm/llvm-project/pull/65706