[PATCH] D116928: [LoopVectorize] Support epilogue vectorisation of loops with reductions
Kerry McLaughlin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 12 06:33:15 PST 2022
kmclaughlin added a comment.
Thank you for the comments on this patch, @sdesmalen & @bmahjour!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4339
+ if (ResumeInst)
+ ResumePhi = dyn_cast<PHINode>(ResumeInst);
+
----------------
sdesmalen wrote:
> It's not necessarily safe to assume ResumePhi is a resume value just because it is a PHINode, so I'd recommend looking this up in the map that I mentioned earlier.
I've changed this to look up the start value from the VPReductionPHIRecipe as this should be set correctly for the epilogue.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4350-4351
+ for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I) {
+ if (ResumeInst && LoopBypassBlocks[I] == ResumeInst->getParent())
+ BCBlockPhi->addIncoming(ResumeVal, LoopBypassBlocks[I]);
+ else if (ResumePhi &&
----------------
sdesmalen wrote:
> This case seems a bit odd, does it matter that ResumeValue is an instruction?
>
> For the preheader after epilogue vectorization, I'd expect there to only be three cases to consider:
> 1. Original start value (when not having executed any vector code at all).
> 2. Intermediate reduction value after the main vector body (i.e. no epilogue vectorization because of the iter checks)
> 3. Intermediate reduction value after the vectorized epilogue. (i.e. the VPReductionPHI's new start value).
>
> If you know that ResumePhi is actually a resume PHI node, you can write the loop as:
>
> for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
> if (Incoming == LoopMiddleBlock)
> BCBlockPhi->addIncoming(ReducedPartRdx, Incoming);
> else if (ResumePhi && // case 2 from above.
> // note that ResumePhi is only non-zero for epilogue vectorization.
> llvm::is_contained(ResumePhi->blocks(), Incoming))
> BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
> Incoming);
> else
> BCBlockPhi->addIncoming((Value *)ReductionStartValue, Incoming);
> }
>
> // Now, we need to fix the users of the reduction variable
> // inside and outside of the scalar remainder loop.
>
> [...]
I don't think it matters if ResumeValue is an instruction, so I've rewritten the loop as suggested above.
================
Comment at: llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll:499
+ ret i16 %ret
+}
----------------
bmahjour wrote:
> Please add a test case for when there are more than one reductions, eg:
> ```
> float foo(float *arr, int n) {
> float sum = 0.0;
> float prod = 1.0;
> for (int i = 0; i < n; i++) {
> sum += arr[i];
> prod *= arr[i];
> }
> return prod/sum;
> }
> ```
Hi @bmahjour, I have added a test case using the above example here (`@multiple_fp_rdx`)
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116928/new/
https://reviews.llvm.org/D116928
More information about the llvm-commits
mailing list