[PATCH] D116928: [LoopVectorize] Support epilogue vectorisation of loops with reductions

Wed Jan 12 06:33:15 PST 2022

kmclaughlin added a comment.

Thank you for the comments on this patch, @sdesmalen & @bmahjour!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4339
+  if (ResumeInst)
+    ResumePhi = dyn_cast<PHINode>(ResumeInst);
+
----------------
sdesmalen wrote:
> It's not necessarily safe to assume ResumePhi is a resume value just because it is a PHINode, so I'd recommend looking this up in the map that I mentioned earlier.
I've changed this to look up the start value from the VPReductionPHIRecipe as this should be set correctly for the epilogue.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4350-4351
+  for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I) {
+    if (ResumeInst && LoopBypassBlocks[I] == ResumeInst->getParent())
+      BCBlockPhi->addIncoming(ResumeVal, LoopBypassBlocks[I]);
+    else if (ResumePhi &&
----------------
sdesmalen wrote:
> This case seems a bit odd, does it matter that ResumeValue is an instruction?
> 
> For the preheader after epilogue vectorization, I'd expect there to only be three cases to consider:
> 1. Original start value (when not having executed any vector code at all).
> 2. Intermediate reduction value after the main vector body (i.e. no epilogue vectorization because of the iter checks)
> 3. Intermediate reduction value after the vectorized epilogue. (i.e. the VPReductionPHI's new start value).
> 
> If you know that ResumePhi is actually a resume PHI node, you can write the loop as:
> 
>   for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
>     if (Incoming == LoopMiddleBlock)
>       BCBlockPhi->addIncoming(ReducedPartRdx, Incoming);
>     else if (ResumePhi &&              // case 2 from above.
>                                        // note that ResumePhi is only non-zero for epilogue vectorization.
>              llvm::is_contained(ResumePhi->blocks(), Incoming))
>       BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
>                               Incoming);
>     else
>       BCBlockPhi->addIncoming((Value *)ReductionStartValue, Incoming);
>   }
>   
>   // Now, we need to fix the users of the reduction variable
>   // inside and outside of the scalar remainder loop.
>   
>   [...]
I don't think it matters if ResumeValue is an instruction, so I've rewritten the loop as suggested above.

================
Comment at: llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll:499
+  ret i16 %ret
+}
----------------
bmahjour wrote:
> Please add a test case for when there are more than one reductions, eg:
> ```
> float foo(float *arr, int n) {
>   float sum = 0.0;
>   float prod = 1.0;
>   for (int i = 0; i < n; i++) {
>     sum += arr[i];
>     prod *= arr[i];
>   }
>   return prod/sum;
> }
> ```
Hi @bmahjour, I have added a test case using the above example here (`@multiple_fp_rdx`)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116928/new/

https://reviews.llvm.org/D116928