[llvm] [LV][VPlan] Add initial support for CSA vectorization (PR #106560)

Thu Oct 17 03:35:44 PDT 2024

Mel-Chen wrote:

> @Mel-Chen 's #67812 seems to be quite similar work which just extends the existing code. This CSA patch is more general in that it can handle values that aren't just the induction variable, but I think the amount of extra code can be cut down a little.

#67812 only handles increasing induction variables because `FindLastIV` was designed to address the index reduction in min/max with index reductions, and the index is typically an increasing induction variable.

Currently, I do not have plans to support `FindLast`, mainly because RISC-V has `vfirst` but no `vlast` instruction. However, I can still provide some suggestions.
For idiom recognition, we could directly extend the `RecurrenceDescriptor`. 
For vectorization, I recommend focusing on an in-loop reduction approach for this idiom, as an out-loop reduction would require generating additional recurrence phi in the vectorized loop, making vectorization too costly. To implement in-loop reduction, it's best to introduce a new intrinsic, `vector.extractlast(%scalar_start, %vec_val, %vec_mask)`, with semantics as follows: 
```
  %c = vcpop(%vec_mask)
  if(%c == 0)
    return %scalar_start;
  %vec_com = compress(%vec_val, %vec_mask)
  return extractelement %vec_com, %c - 1
```
For targets without a corresponding instruction, such as RISC-V, we could implement the same semantics in the backend using multiple instructions (e.g., vcpop + beqz + vcompress) if it is profitable. In VPlan, we might need to create a new recipe, `VPExtractLastRecipe`, to emit this intrinsic. I hope these suggestions are helpful.

In the end, if there is indeed a need to support this semantic, please let me know, and I can discuss internally whether to include it in the plan. :)

https://github.com/llvm/llvm-project/pull/106560