[PATCH] D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable

Fri Jul 7 04:44:34 PDT 2023

artagnon added a comment.

In D150851#4479839 <https://reviews.llvm.org/D150851#4479839>, @Mel-Chen wrote:

> In D150851#4467261 <https://reviews.llvm.org/D150851#4467261>, @Ayal wrote:
>
>> Would be good to try and find more accurate names than using `Select*Cmp` combinations. A Compare-Select pattern is also used in the existing Min/Max reduction, but has a much better name.
>
> Sure, perhaps @artagnon can join the discussion as well. Let me share my experience first: In GCC, I have seen a classification called `ExtractLast`, which has a similar semantics to what you mentioned as `FindLast`. Welcome further input and opinions.

I agree with @Ayal: I also got confused by the `SelectCmp` naming convention initially. Perhaps `[I|F]AnyOfInv`, `[I|F]AnyOfInc`, and `[I|F]AnyOfDec`, and rename the corresponding function `isAnyOfReduction`? I don't have a strong preference for the rename.

> If we focus on removing the wrapping and bound restrictions, I think we can consider the approach proposed by @artagnon in D152693 <https://reviews.llvm.org/D152693>. This method cleverly extends the technique used by @david-arm in `SelectICmp`. The approach can be summarized as follows: 
> Consider the loop:
>
>   unsigned int red = start_value;
>   for (unsigned int i = 0; i < n; ++i)
>     red = (a[i] > b[i]) ? i : red;
>
> vectorize to:
>
>   unsigned int red = start_value;
>   vec_unsigned_int red_part = splat(start_value);
>   vec_unsigned_int step_vec = {0, 1, 2, ...};
>   for (unsigned int i = 0; i < n; i+=vl) {
>     red_part = (vec_a[i] > vec_b[i]) ? step_vec : red_part;
>     step_vec += {vl, vl, vl, ...};
>   }
>   vec_bool ne_start_value = red_part != splat(start_value);
>   bool may_update = reduce.or(ne_start_value);
>   vec_unsigned_int masked_red_part = ne_start_value ? red_part : splat(DataTypeMin);
>   red = may_update ? reduce.smax|umax(masked_red_part) : start_value;
>
> While the conditions checked in this patch are more strict, I believe both approaches should coexist. In general, the IR generated by this patch should have better performance in the same case. Therefore, it should be prioritized when possible. However, when the cases that cannot be handled by this patch, we can apply the approach in D152693 <https://reviews.llvm.org/D152693>.

Yes, the IR generated by this patch is quite optimized, and I suppose we can fallback to code-gen like in D152693 <https://reviews.llvm.org/D152693> as a follow-up.

> In addition, there is still room for optimization in this patch. We usually face source code like this:
>
>   j = -1;
>   for (int i = 0; i < n; i++) {
>       if (a[i] < b[i]) {
>           j = i;
>       }
>   }
>
> When the start value of the reduction is a known constant and is known to be smaller than the start value of the increasing induction variable, we may not even need to use a sentinel value. Simply using the reduce max operation would suffice.

I wouldn't worry about this for now. The patch already looks pretty good in its current state, and I think with the renaming that @Ayal proposed, it should be ready to land.

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:661

+  auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {
+    auto *Phi = dyn_cast<PHINode>(V);
----------------
Mel-Chen wrote:
> artagnon wrote:
> > Why not split this out into its own function?
> I'm also think about this because I personally feel that this function is a bit long. 
> Do you think it's better to put it here as a static function, or put it in another file?
A static function would be nice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150851/new/

https://reviews.llvm.org/D150851