[llvm] Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)" (PR #168738)

Fri Nov 21 03:10:52 PST 2025

================
@@ -1095,8 +1095,19 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
     AnyOf,
     // Calculates the first active lane index of the vector predicate operands.
     // It produces the lane index across all unrolled iterations. Unrolling will
-    // add all copies of its original operand as additional operands.
+    // add all copies of its original operand as additional operands. If any of
+    // the vector predicate operands are all zeroes then this returns poison.
----------------
fhahn wrote:

I think this is also not working correctly for early exits when interleaving. Say we have 2 vector operands, then the first one may be all-zero, and only the second one has an active lane.

We create selects selects the current part, if the count is != # of vector elements, i.e. all lanes are zero. Unless I am missing something, we should always pass `false`? If so, this can be done as an independent fix.

I think at least on AArch64, the same code is generated for both `true`/`false`, but not for RISCV. I guess this explains why this isn't uncovered with early-exit vectorization, as it requires interleaving. Without interleaving, there must be at least one active lane in the (first and only) vector operand, otherwise we would not have taken the early exit.

reading through the LangRef again, it seems like the Semantics section for the intrinsic does not mention the second argument, which seems a bit unfortunate: https://llvm.org/docs/LangRef.html#id964

https://github.com/llvm/llvm-project/pull/168738