[PATCH] D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable

Thu Jun 22 04:01:01 PDT 2023

Ayal added a comment.

An important step forward - adding some general thoughts and terminology, going (admittedly nearly two years) back to D108136 <https://reviews.llvm.org/D108136>.

This `FindLast` compound pattern combines two header phi's: an induction and a reduction. The reduction is a "selecting reduction" which choses one element from the reduced set, or none. The induction is recorded to return the index of the (last) element found. The induction must be monotone and have some out-of-bounds value, which this patch ensures by restricting to signed inductions that start at non-min-signed-value and increase w/o wrapping - restrictions which can be lifted.

Reducing a collection C of values into one value v can be classified as either a "selecting reduction" or a "combining reduction", depending on whether v is always a member of C or not, respectively. Min and max reductions are selecting reductions whereas add (including fmuladd), multiply, or, and, xor are in general combining reductions.
When C is a collection of boolean values, the selecting reductions "max" and "min" practically compute `Any` and `All`, respectively, as in std::any_of() and std::all_of(). Support for boolean selecting reductions was introduced in D108136 <https://reviews.llvm.org/D108136>, which should arguably be called `[I|F]Any` rather than `Select[I|F]Cmp`. The boolean values produced by Integer/Float Compares such as "(src[i] > 3)" or "(a[i] > b[i])" are essentially being "max" reduced; any desired pair of invariant return values can be set/selected after determining the outcome of the boolean reduction.
Note that an `Any` reduction can terminate once "true" is encountered, similar to a general max/min reduction encountering max/min-value.

A selecting reduction could report the index of the value reduced in addition to the value itself. If the reduced value appears multiple times, the index of the first or last appearance can be reported. Tests for such `MinLast` cases, aka argmin, were introduced in 4f04be564907f <https://reviews.llvm.org/rG4f04be564907fb7ddff8ebc7773b892a93b00f2e>, and are yet to be vectorized by LV - hope this patch helps us get there! These are compound patterns combining three header phi's: an induction and two reductions.

A boolean selecting max/Any reduction reporting the index is typically interested in the index only if the reduced value is 1/"true", otherwise the index is obvious.
This patch deals with boolean selecting `Any` reductions that report the index of the (last) value reduced, provided it is "true", and may be called `FindLast`, as in std::find_if() being `FindFirst`.

When vectorizing and/or unroll-and-interleaving a selecting reduction with index, the indices of multiple candidates need to be compared to determine which is first (or last), during the reduction epilog (for in-loop reductions this is trivial). This comparison requires the indices to be monotone, i.e., to avoid wrapping.
When dealing with Any reductions with index of "true" values, the indicator that a "true" value was encountered can be folded together with the index found so far (of a true value) by using an "invalid" out-of-bounds index - preferably smaller than first iteration for `FindLast` or larger than last iteration for `FindFirst`. Such values are overwritten naturally by the (valid) index of any "true" value, when selecting the first or last index. This reduces the pattern to consider a single reduction (of the combined index+indicator value) rather than the two reductions of the general `MaxLast` case (of index and value).

This `FindLast` patch currently meets these requirements by restricting to indices that are increasing, signed, and start from a non-min-signed value. It seems unnatural for such indices to wrap, or if PSCEV guards against AddRec wrapping in general(?), but even if an index may wrap and/or does not provide desired out-of-bounds values, a designated IV counting **vector** iterations could be used from which the original indices can later be reconstructed in the epilog and reduced. Such an IV is immune to wrapping and provides out-of-bound values. This is one of several possible ways to lift these restrictions.

Note that `Any` reductions reporting the first index can terminate once "true" is encountered, but seem more cumbersome to write (w/o a break), e.g.,:

  // FindFirst w/o break.
  int red = ii;
  int red_set = false;
  for (int i = 0; i < n; ++i)
    if (a[i] > b[i]) {
      red = red_set ? red : i;
      red_set = true;
    }

instead of

  // FindLast.
  int red = ii;
  for (int i = 0; i < n; ++i)
    red = (a[i] > b[i]) ? i : red;

A `FindLast` loop could be optimized into a `FindFirst` one by reversing the loop.

================
Comment at: llvm/test/Transforms/LoopVectorize/select-min-index.ll:89
+; CHECK-VF4IC1-NEXT:  entry:
+; CHECK-VF4IC1-NEXT:    br i1 true, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-VF4IC1:       vector.ph:
----------------
This test now gets vectorized, being a `FindLast` loop that reports the last index where a[i] < a[i-1]+1, or zero if none are found. (I.e., proving that a sequence is not strictly increasing, rather than computing `MinLast`.)
But the vector loop is never reached?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150851/new/

https://reviews.llvm.org/D150851