[PATCH] D141842: [LoopVectorize] Enable integer Mul and Add as select reduction patterns

Tue Jan 17 03:47:26 PST 2023

sdesmalen added a comment.

Seems like a sensible change to me.

nit: Could you add a motivating (C/C++ example) to the commit message?

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:826
+; CHECK: %[[V1:.*]] = fcmp fast ogt <4 x float> %[[V0:.*]], zeroinitializer
+; CHECK: %[[V3:.*]] = add <4 x i32> %[[V2:.*]], <i32 2, i32 2, i32 2, i32 2>
+; CHECK: select <4 x i1> %[[V1]], <4 x i32> %[[V3]], <4 x i32> %[[V2]]
----------------
MattDevereau wrote:
> Unfortunately integer flags aren't being propagated here. After having a quick look around the issue appears non-trivial as fast-math flags are propagated for the floating point case with a disclaimer. In `RecurrenceDescriptor::AddReductionVar` just after where the changes to `RecurrenceDescriptor::isConditionalRdxPattern` were made:
> ```
>       // FIXME: FMF is allowed on phi, but propagation is not handled correctly.
>       if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
>         FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
>         if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {
>           // Accept FMF on either fcmp or select of a min/max idiom.
>           // TODO: This is a hack to work-around the fact that FMF may not be
>           //       assigned/propagated correctly. If that problem is fixed or we
>           //       standardize on fmin/fmax via intrinsics, this can be removed.
> 
> ```
> After a look around for methods of propagating the IR flags I'm not quite sure how to proceed.
I wouldn't be too worried about this, it seems the nsw/nuw flags aren't propagated for other reductions either.

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:842
+  %0 = load float, ptr %arrayidx, align 4
+  %cmp.2 = fcmp fast ogt float %0, 0.000000e+00
+  %add = add nsw i32 %sum.1, 2
----------------
nit: I guess it also works if you remove this `fast` comparison right?

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:858
+; CHECK: select <4 x i1> %[[V1]], <4 x i64> %[[V3]], <4 x i64> %[[V2]]
+define i64 @fcmp_0_add_select2(ptr noalias %x, i64 %N) nounwind readonly {
+entry:
----------------
What is the difference between this function and `@fcmp_0_add_select1` ?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141842/new/

https://reviews.llvm.org/D141842