[llvm] [LV] Fix FindLastIV reduction for epilogue vectorization. (PR #120395)

Wed Jan 8 02:28:17 PST 2025

================
@@ -3405,15 +3406,13 @@ void VPReductionPHIRecipe::execute(VPTransformState &State) {
     }
   } else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) {
     // [I|F]FindLastIV will use a sentinel value to initialize the reduction
-    // phi or the resume value from the main vector loop when vectorizing the
-    // epilogue loop. In the exit block, ComputeReductionResult will generate
-    // checks to verify if the reduction result is the sentinel value. If the
-    // result is the sentinel value, it will be corrected back to the start
-    // value.
+    // phi. In the exit block, ComputeReductionResult will generate checks to
+    // verify if the reduction result is the sentinel value. If the result is
+    // the sentinel value, it will be corrected back to the start value.
     // TODO: The sentinel value is not always necessary. When the start value is
     // a constant, and smaller than the start value of the induction variable,
     // the start value can be directly used to initialize the reduction phi.
-    Iden = StartV;
+    StartV = Iden = RdxDesc.getSentinelValue();
----------------
Mel-Chen wrote:

AnyOf can do this because there is only 1 element in the sequence.
FindLastIV possible needs to be adjusted like this:
Change bc.merge.rdx from
```
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_SELECT]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 3, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
```
to 
```
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[TMP6]], %[[VEC_EPILOG_ITER_CHECK]] ], [ -9223372036854775808, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
;; TMP6 is defined by `[[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])`
```
The method is to pattern match `rdx_res = max_rdx != sentinel ? max_rdx : start_value`, and directly use the result of reduce.smax (i.e. `max_rdx`) to resume the reduction result instead of the adjusted result (i.e. `rdx_res`).

I think this is a little more complicated, what do you think?

https://github.com/llvm/llvm-project/pull/120395