[PATCH] D100113: [LV] Move reduction PHI node fixup to VPlan::execute (NFC).

Mon May 3 00:47:51 PDT 2021

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9242
     RecurKind Kind = RdxDesc->getRecurrenceKind();
-    bool IsOrdered = useOrderedReductions(*RdxDesc);
+    bool IsOrdered = LoopVectorizationPlanner::useOrderedReductions(*RdxDesc);
     Value *NewVecOp = State.get(getVecOp(), Part);
----------------
better turn on an IsOrdered bit inside VPReductionRecipe during its creation or planning, than during its execute? Admittedly goes back to D98435.

There are 3 types of reductions: (1) parallel lanes: each lane computes an independent partial-sum; (2) parallel parts: each vector computes an independent partial sum, in-loop within vectors but not across parts, and (3) serial: in-loop across vectors and parts, the header phi operates on a single scalar. Would be good to explicitly indicate which type of reduction is chosen, in both ReductionRecipe(s) and the header phi; it effects their code-gen.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:69

+bool LoopVectorizationPlanner::useOrderedReductions(
+    RecurrenceDescriptor &RdxDesc) {
----------------
Defining a method of LVP inside VPlan.cpp seems odd, at best.

The use of a gating EnableStrictReductions knob (in contrast to a CM cost-based decision), belongs to Legal rather than CM or Planner? I.e., Legal should bail out from vectorizing a loop having an ordered reduction, when !EnableStrictReductions? Admittedly goes back to D98435.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:765
+    for (unsigned Part = 0; Part < State->UF; ++Part) {
+      auto *VecRdxPhi = cast<PHINode>(State->get(PhiR->getVPValue(0), Part));
+      Value *Val = State->get(PhiR->getOperand(1), Part);
----------------
Moving this to VPlan::execute is great!
Can use getSingleVPValue(), getBackedgeValue().
May be worth explaining that Val is first set per-part supporting parallel partial sums, either vector/per-lane (1) or scalar/per-part (2), and then set to last-part supporting serial reduction (3).
Admittedly goes back to D98435.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100113/new/

https://reviews.llvm.org/D100113