[PATCH] D75069: [LoopVectorizer] Inloop vector reductions

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 8 11:36:47 PDT 2020


dmgreen marked 15 inline comments as done.
dmgreen added inline comments.


================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:439
     //       model could then apply the recurrence type to these instructions,
     //       without needing a white list of instructions to ignore.
     collectCastsToIgnore(TheLoop, ExitInstruction, RecurrenceType, CastInsts);
----------------
Ayal wrote:
> Perhaps the above "better way" would also help recognize and record horizontal reductions?
Hmm. The reason I kept them separate was that this method is already pretty complex. I was trying to keep thing simpler. Adding the ability to detect a single chain of operations from Phi to LoopExitValue that can be used for horizontal reductions looks.. difficult. And error prone. :) If you think it's worth it then I can certainly give it a go! I like the separation of concerns in keeping them separate though.

The extra things that AddReductionVar will detect that getReductionOpChain will not are:
 - Phi/select predicated reductions like in if-conversion-reductions.ll and if-reduction.ll. These would need some form of predicated reduction intrinsic.
 - Narrow bitwidths. This one I could add.
 - Subs/FSubs are treated like Adds/FAdds for vertical reductions.


================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:848
+
+  if (!isCorrectOpcode(Cur))
+    return {};
----------------
Ayal wrote:
> `|| !Cur->hasNUses(ExpectedUses)` ?
> 
> 
> nit: can alternatively let getNextInstruction check its result and return only valid ones, e.g.:
> 
> ```
> bool RedOpIsCmp = (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp);
> unsigned ExpectedUses = RedOpIsCmp ? 2 : 1;
> 
> auto getNextInstruction = [&](Instruction *Cur) {
>   if (!Cur->hasNUses(ExpectedUses))
>     return nullptr;
>   auto *FirstUser = Cur->user_begin();
>   if (!RedOpIsCmp)
>     return FirstUser->getOpcode() == RedOp ? FirstUser : nullptr;
>   // Handle cmp/select pair:
>   auto *Sel = dyn_cast<SelectInst>(*FirstUser) ||
>     dyn_cast<SelectInst>(*std::next(FirstUser));
>   if (SelectPatternResult::isMinOrMax(matchSelectPattern(Sel, LHS, RHS).Flavor))
>     return Sel;
>   return nullptr;
> }
> 
> for (auto *Cur = getNextInstruction(Phi); Cur && Cur != LoopExitInstr; Cur = getNextInstruction(Cur))
>   ReductionOperations.push_back(Cur);
> ```
This is the loop exit instr, so can have as many uses as it likes I believe.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3769
     // MinMax reduction have the start value as their identify.
-    if (VF == 1) {
+    if (VF == 1 || UseInloopReductions) {
       VectorStart = Identity = ReductionStartValue;
----------------
Ayal wrote:
> dmgreen wrote:
> > Ayal wrote:
> > > This is dead code if cmp/select chains are not recognized yet, as noted above.
> > I've added the code to handle minmax too (but not tested it a lot yet. I will try that now).
> > 
> > MVE has instructions for integer min/max reductions, but they can be slow enough to make them not worth using over a normal vmin/vmax. Adds are always not-slower-enough to warrant the inloop reduction (and have other advantages like handling higher type sizes and folding in more instructions.)
> > 
> > My point is that min/max, like some of the other fadd/mul/and/etc might not be used by MVE yet. If you think the code is more hassle than it deserves, then we could take them out for the time being. I'd like to leave them in for consistency though, even if it's not used straight away.
> Would be good to make sure code is being exercised and tested. Could inloop min/max (and/or other reductions) help reduce code size, and be applied when vectorizing under optsize?
-Os sounds like a good plan. It will take some backend work to make it efficient enough first though. And predicated reductions?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3847
+  if ((VF > 1 && !UseInloopReductions) &&
+      Phi->getType() != RdxDesc.getRecurrenceType()) {
     Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
----------------
Ayal wrote:
> dmgreen wrote:
> > Ayal wrote:
> > > Checking if !UseInLoopReductions is redundant, as current in loop reductions have phi's of the same type as the recurrence; right?
> > > May leave it as an assert, until in loop reductions handle truncation/extension.
> > Right. We look from the phi down (now), so can't go through the 'and' we would need to detect the different bitwidth. That would leave 'and' reduction,  which I think would not got through that codepath to create type promoted phis?
> > 
> > I've put an explicit check in for it to be sure, and added some testing for those cases.
> 'and' reduction may or may not be performed in a smaller type, iinm.
The check for lookThroughAnd in AddReductionVar is guarded by isArithmeticRecurrenceKind, so at least currently it cannot be an And reduction.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7286
+    RecurrenceDescriptor &RdxDesc = Reduction.second;
+    if (CM.useInloopReductions(Reduction.first)) {
+      PHINode *Phi = Reduction.first;
----------------
Ayal wrote:
> dmgreen wrote:
> > Ayal wrote:
> > > Iterate over in loop reductions?
> > Do you mean adding an iterator for iterating over reductions, stepping over the ones not inloop?
> > 
> > It would seem like it's similar to the existing code, but as a new iterator class. My gut says the current code is simpler and clearer what is going on?
> Suggestion was to iterate over the PHIs/elements of InloopReductionChains, rather than over all reduction PHIs of Legal->getReductionVars().
> 
> (Better early-exit via "if (!CM.isInLoopReduction(Reduction.first)) continue;")
I believe that InloopReductionChains would not iterate in a deterministic order, which is why I avoided it.

Perhaps that would not matter here? The reductions should be independent anyway. Seems safer to try and use deterministic ordering anyway if we can.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7442
   // and the live-out instruction of each reduction, at the end of the latch.
-  if (CM.foldTailByMasking()) {
+  if (CM.foldTailByMasking() && CM.hasOuterloopReductions()) {
     Builder.setInsertPoint(VPBB);
----------------
Ayal wrote:
> Ayal wrote:
> > In loop reductions are currently disabled in fold tail. Checking also if hasOuterloopReductions() is an independent optimization that should be done separately of this patch.
> Checking also if hasOuterloopReductions() is an independent optimization that should be done separately of this patch.
Ah sorry. I split it off but apparently didn't upstream the other part yet. Now in D81415.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75069/new/

https://reviews.llvm.org/D75069





More information about the llvm-commits mailing list