[llvm] [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (PR #67812)

Mel Chen via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 31 06:22:38 PDT 2024


================
@@ -659,6 +661,100 @@ RecurrenceDescriptor::isAnyOfPattern(Loop *Loop, PHINode *OrigPhi,
                                                      : RecurKind::FAnyOf);
 }
 
+// We are looking for loops that do something like this:
+//   int r = 0;
+//   for (int i = 0; i < n; i++) {
+//     if (src[i] > 3)
+//       r = i;
+//   }
+// The reduction value (r) is derived from either the values of an increasing
+// induction variable (i) sequence, or from the start value (0).
+// The LLVM IR generated for such loops would be as follows:
+//   for.body:
+//     %r = phi i32 [ %spec.select, %for.body ], [ 0, %entry ]
+//     %i = phi i32 [ %inc, %for.body ], [ 0, %entry ]
+//     ...
+//     %cmp = icmp sgt i32 %5, 3
+//     %spec.select = select i1 %cmp, i32 %i, i32 %r
+//     %inc = add nsw i32 %i, 1
+//     ...
+// Since 'i' is an increasing induction variable, the reduction value after the
+// loop will be the maximum value of 'i' that the condition (src[i] > 3) is
+// satisfied, or the start value (0 in the example above). When the start value
+// of the increasing induction variable 'i' is greater than the minimum value of
+// the data type, we can use the minimum value of the data type as a sentinel
+// value to replace the start value. This allows us to perform a single
+// reduction max operation to obtain the final reduction result.
+// TODO: It is possible to solve the case where the start value is the minimum
+// value of the data type or a non-constant value by using mask and multiple
+// reduction operations.
+RecurrenceDescriptor::InstDesc
+RecurrenceDescriptor::isFindLastIVPattern(PHINode *OrigPhi, Instruction *I,
+                                          ScalarEvolution &SE) {
+  // TODO: Support the vectorization of FindLastIV when the reduction phi is
+  // used by more than one select instruction. This vectorization is only
+  // performed when the SCEV of each increasing induction variable used by the
+  // select instructions is identical.
+  if (!OrigPhi->hasOneUse())
+    return InstDesc(false, I);
+
+  // TODO: Match selects with multi-use cmp conditions.
+  CmpInst::Predicate Pred;
+  Value *TrueVal, *FalseVal;
+  if (!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())),
+                         m_Value(TrueVal), m_Value(FalseVal))))
+    return InstDesc(false, I);
+
+  Value *NonRdxPhi = nullptr;
+  if (OrigPhi == dyn_cast<PHINode>(TrueVal))
+    NonRdxPhi = FalseVal;
+  else if (OrigPhi == dyn_cast<PHINode>(FalseVal))
+    NonRdxPhi = TrueVal;
+  else
+    return InstDesc(false, I);
+
+  auto IsIncreasingLoopInduction = [&](Value *V) {
+    Type *Ty = V->getType();
+    if (!SE.isSCEVable(Ty))
+      return false;
+
+    auto *AR = dyn_cast<SCEVAddRecExpr>(SE.getSCEV(V));
+    if (!AR)
+      return false;
+
+    const SCEV *Step = AR->getStepRecurrence(SE);
+    if (!SE.isKnownPositive(Step))
+      return false;
+
+    const ConstantRange IVRange = SE.getSignedRange(AR);
+    unsigned NumBits = Ty->getIntegerBitWidth();
+    // Keep the minimum value of the recurrence type as the sentinel value.
+    // The maximum acceptable range for the increasing induction variable,
+    // called the valid range, will be defined as
+    //   [<sentinel value> + 1, <sentinel value>)
+    // where <sentinel value> is SignedMin(<recurrence type>)
+    // TODO: This range restriction can be lifted by adding an additional
+    // virtual OR reduction.
+    const APInt Sentinel = APInt::getSignedMinValue(NumBits);
----------------
Mel-Chen wrote:

I tried to create a concrete example based on your description:
C source
```
unsigned int foo(unsigned int start, int *a, unsigned int n) {
    unsigned int rdx = start;
    for (unsigned int i = 0; i <= 2147483648; i++) {
        rdx = a[i] > 3 ? i : rdx;
    }
    return rdx;
}
```
LLVM IR
```
; Function Attrs: nofree norecurse nosync nounwind memory(argmem: read) vscale_range(8,1024)
define dso_local signext i32 @foo(i32 noundef signext %start, ptr nocapture noundef readonly %a, i32 noundef signext %n) local_unnamed_addr #0 {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  %cond.lcssa = phi i32 [ %cond, %for.body ]
  ret i32 %cond.lcssa

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %rdx.08 = phi i32 [ %start, %entry ], [ %cond, %for.body ]
  %arrayidx = getelementptr inbounds nuw i32, ptr %a, i64 %indvars.iv
  %0 = load i32, ptr %arrayidx, align 4, !tbaa !6
  %cmp2 = icmp sgt i32 %0, 3
  %1 = trunc nuw i64 %indvars.iv to i32
  %cond = select i1 %cmp2, i32 %1, i32 %rdx.08
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 2147483649
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !10
}
```
>From my side, it seems that this case has not been vectorized.
```
LV: FindLastIV valid range is [-2147483647,-2147483648), and the signed range of {0,+,1}<%for.body> is [0,-2147483647)
```

Would you like me to add this test to the test file if the example I provided aligns with the scenario you are concerned about?

https://github.com/llvm/llvm-project/pull/67812


More information about the llvm-commits mailing list