[PATCH] D141842: [LoopVectorize] Enable integer Mul and Add as select reduction patterns

Tue Jan 17 04:12:19 PST 2023

MattDevereau added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:826
+; CHECK: %[[V1:.*]] = fcmp fast ogt <4 x float> %[[V0:.*]], zeroinitializer
+; CHECK: %[[V3:.*]] = add <4 x i32> %[[V2:.*]], <i32 2, i32 2, i32 2, i32 2>
+; CHECK: select <4 x i1> %[[V1]], <4 x i32> %[[V3]], <4 x i32> %[[V2]]
----------------
sdesmalen wrote:
> MattDevereau wrote:
> > Unfortunately integer flags aren't being propagated here. After having a quick look around the issue appears non-trivial as fast-math flags are propagated for the floating point case with a disclaimer. In `RecurrenceDescriptor::AddReductionVar` just after where the changes to `RecurrenceDescriptor::isConditionalRdxPattern` were made:
> > ```
> >       // FIXME: FMF is allowed on phi, but propagation is not handled correctly.
> >       if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
> >         FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
> >         if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {
> >           // Accept FMF on either fcmp or select of a min/max idiom.
> >           // TODO: This is a hack to work-around the fact that FMF may not be
> >           //       assigned/propagated correctly. If that problem is fixed or we
> >           //       standardize on fmin/fmax via intrinsics, this can be removed.
> > 
> > ```
> > After a look around for methods of propagating the IR flags I'm not quite sure how to proceed.
> I wouldn't be too worried about this, it seems the nsw/nuw flags aren't propagated for other reductions either.
Very well, in the C/C++ example I've added the flags aren't propagated either when 1 is used as an immediate and we don't go through the `select` route.

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:842
+  %0 = load float, ptr %arrayidx, align 4
+  %cmp.2 = fcmp fast ogt float %0, 0.000000e+00
+  %add = add nsw i32 %sum.1, 2
----------------
sdesmalen wrote:
> nit: I guess it also works if you remove this `fast` comparison right?
You're correct, yes. I'll go ahead and remove it from the tests I've added.

================
Comment at: llvm/test/Transforms/LoopVectorize/if-reduction.ll:858
+; CHECK: select <4 x i1> %[[V1]], <4 x i64> %[[V3]], <4 x i64> %[[V2]]
+define i64 @fcmp_0_add_select2(ptr noalias %x, i64 %N) nounwind readonly {
+entry:
----------------
sdesmalen wrote:
> What is the difference between this function and `@fcmp_0_add_select1` ?
The data type being reduced is i64 in this function whereas its i32 in `@fcmp_0_add_select1`. These tests are integer clones of `@fcmp_0_fadd_select1` and `@fcmp_0_fadd_select2` above in this file, with the exception that these integer variants add an immediate instead of loaded in data.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141842/new/

https://reviews.llvm.org/D141842