[PATCH] D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization

Sun Mar 14 12:34:06 PDT 2021

dmgreen added a reviewer: spatel.
dmgreen added a comment.

Hello. Fantastic to see this getting used in more cases. In-order reductions sound like a great use of it.

================
Comment at: llvm/include/llvm/Analysis/IVDescriptors.h:257
   bool IsSigned = false;
+  // If this is an ordered reduction
+  bool IsOrdered = false;
----------------
Am I correct in saying that Ordered reductions are a subset of floating point reductions that
- don't have AllowReassoc on the fadd
- have only a single fadd added in each loop iteration (but possibly predicated to be one of several additions). So either fadd(redphi(..)) or phi(fadd(redphi), redphi) or phi(fadd(redphi), fadd(redphi)) ?

And that we can't easily get this info from just the FMF or the ExactFPMathInst?

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:204
+  // If this comes from a Phi node, look through it
+  if (auto *EIP = dyn_cast<PHINode>(Exit)) {
+    if (EIP->getNumIncomingValues() != 2)
----------------
I think this may need to check all branches of the phi, if I'm understanding what this is doing exactly. They could have different instructions down each path with different flags (although that would be quite rare I suspect).

There might also be selects, if this is matching on if-block phis. I'm not sure if that is handled in AddReductionVar though.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4361
   {
-    // Floating-point operations should have some FMF to enable the reduction.
-    IRBuilderBase::FastMathFlagGuard FMFG(Builder);
-    Builder.setFastMathFlags(RdxDesc.getFastMathFlags());
-    for (unsigned Part = 1; Part < UF; ++Part) {
-      Value *RdxPart = State.get(LoopExitInstDef, Part);
-      if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
-        ReducedPartRdx = Builder.CreateBinOp(
-            (Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
-      } else {
-        ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
+    if (Cost->isInLoopReduction(Phi) && useOrderedReductions(RdxDesc))
+      ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);
----------------
Cost->isInLoopReduction(Phi) -> IsInLoopReductionPhi

Does this work in-order for UF > 1? I feel like the order of the adds will be changed, so that it's no-longer in-order.

If not, is this code needed, if it can only be using UF == 1 and ReducedPartRdx is set to State.get(LoopExitInstDef, 0) above.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9137
+          IntegerType::getInt1Ty(State.Builder.getContext()), 1);
+      Value *MaskOfOnes =
+          State.Builder.CreateVectorSplat(State.VF, C, "broadcast");
----------------
I'm suspecting that the MaskOfOnes isn't needed, and any masking needed would be handled by the Select created above? Otherwise it can use the Mask from the Cond operand to detect when the instruction needs to be predicated and when not.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll:8
+; CHECK: %[[LOAD:.*]] = load <8 x float>, <8 x float>*
+; CHECK: %[[MASKED_VEC:.*]] = select <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> %[[LOAD]], <8 x float> zeroinitializer
+; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[MASKED_VEC]])
----------------
Note that the identity value for a fadd (without nsz) should be -0.0, not 0.0. Otherwise 0.0 + n doesn't always equal n. https://alive2.llvm.org/ce/z/LS2RK3

I believe the select would only be needed if the reduction is predicated though, as mentioned above.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98435/new/

https://reviews.llvm.org/D98435