[llvm] [LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (PR #124093)
Mel Chen via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 17 06:37:13 PST 2025
================
@@ -1635,18 +1635,62 @@ void VPlanTransforms::addActiveLaneMask(
HeaderMask->replaceAllUsesWith(LaneMask);
}
+/// Adjust the way the resume value is obtained when using tail folding by EVL.
+/// Expanding ExtractFromEnd since the penultimate EVL could not equals to
+/// VFxUF. Expand
+/// %resume = ExtractFromEnd %vec, 1
+/// to
+/// %last.active.idx = sub %EVL, 1
+/// %resume = extractelement %vec, %last.active.idx
+static void adjustResumePhisForEVL(VPlan &Plan, VPValue &EVL) {
----------------
Mel-Chen wrote:
> Do we need to adjust all ExtractFromEnd instructions, not just the ones that use ResumePhi and where the index is zero?
>
> E.g. it looks like `addExitUsersForFirstOrderRecurrences` also creates an ExtractFromEnd with an index of 2. I think that should probably also be adjusted for EVL.
>
Currently, whether it is `ExtractFromEnd %vec, 1` or `ExtractFromEnd %vec, 2`, neither produces useful IR when tail folding is enabled—be it tail folding by mask or tail folding by EVL.
For `ExtractFromEnd %vec, 1`, it is used for the resume phi. The incoming edge 0 connects to the vector loop, and the value comes from the vector loop. However, the final resume phi node IR will always be removed because the purpose of tail folding is to eliminate the need for a scalar epilogue.
For `ExtractFromEnd %vec, 2`, it is used to extract the reduction result of fixed-order recurrences. `ExtractFromEnd %vec, 2` is only generated when there are external users. However, the current tail folding do not allow fixed-order recurrences to have external users, include tail folding by EVL. Therefore, when tail folding by EVL is enabled, VPlan should not contain `ExtractFromEnd %vec, 2` (see test case `define i32 @FOR_reduction(ptr noalias %A, ptr noalias %B, i64 %TC)`).
In conclusion, whether or not ExtractFromEnd is transformed does not affect the correctness of the final IR. This is because `ExtractFromEnd %vec, 1` is dead code and will eventually be eliminated, while `ExtractFromEnd %vec, 2` should not appear in the VPlan when tail folding is enabled.
> Would it also maybe be simpler if we added a runtime VF operand to `ExtractFromEnd`, similar to `VPReverseVectorPointerRecipe`? Users would pass in Plan.getVF() when building it and VPInstruction::generate would generate the sub (if needed). Then the EVL transform just becomes a matter of replacing the runtime VF operand with EVL.
My thought is that Folding by EVL might not be well-suited for using ExtractFromEnd to compute with reverse indices. It would be simpler and more intuitive to use a regular forward-indexed extractelement instead.
By the way, in folding by EVL, only `ExtractFromEnd %vec, 1` is safe, since we can only guarantee that EVL is greater than 1, regardless of the VF.
I'm considering whether ExtractFromEnd should still be transformed in this patch if it has no substantial impact.
https://github.com/llvm/llvm-project/pull/124093#discussion_r1954563972
https://github.com/llvm/llvm-project/pull/124093
More information about the llvm-commits
mailing list