[PATCH] D69563: [LV] Strip wrap flags from vectorized reductions

Tue Dec 17 23:58:34 PST 2019

Ayal added a comment.

In D69563#1776137 <https://reviews.llvm.org/D69563#1776137>, @a.elovikov wrote:

> Probably not too much important because should be handled by the vector predicated instructions/intrinsics, but still
>
> In D69563#1764483 <https://reviews.llvm.org/D69563#1764483>, @Ayal wrote:
>
> > In D69563#1763331 <https://reviews.llvm.org/D69563#1763331>, @dantrushin wrote:
> >
> > > In D69563#1763159 <https://reviews.llvm.org/D69563#1763159>, @Ayal wrote:
> > >
> > > > Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.
> > > >
> > > > But this should apply to all such operations that participate in the vectorized part of the loop. Note that 
> > > >  (1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and
> > >
> > >
> > > Is there some existing API to find them all? Or I need to invite my own?
> >
> >
> > AFAIK such an API does not currently exist.
> >
> > > Would not it be easier just to not copy wrap flags in widenInstruction() for all instructions [which I was shy to do initially :) ]  or it is too aggressive?
> >
> > Loosing all wrap flags would be too aggressive.
>
>
> Why is it ok not to drop nuw here:
>
>   define i8 @function0(i8 %a) {
>   entry:
>     br label %for.body
>  
>   for.body:
>     %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
>     %cmp5 = icmp ult i8 %a, 127
>     br i1 %cmp5, label %if.then, label %if.end
>  
>   if.then:
>     %mul = mul nuw i8 %a, 2
>     br label %if.end
>  
>   if.end:
>     %k = phi i8 [ %mul, %if.then ], [ %a, %for.body ]
>     %indvars.iv.next = add i32 %indvars.iv, 1
>     %cmp = icmp slt i32 %indvars.iv.next, 42
>     br i1 %cmp, label %for.body, label %for.end
>  
>   for.end:
>     ret i8 undef
>   }
>
>
> Vector code generated is
>
>   vector.ph:                                        ; preds = %entry
>     %broadcast.splatinsert1 = insertelement <4 x i8> undef, i8 %a, i32 0
>     %broadcast.splat2 = shufflevector <4 x i8> %broadcast.splatinsert1, <4 x i8> undef, <4 x i32> zeroinitializer
>     br label %vector.body
>  
>   vector.body:                                      ; preds = %vector.body, %vector.ph
>     %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>     %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
>     %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
>     %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
>     %0 = add i32 %index, 0
>     %1 = icmp ult <4 x i8> %broadcast.splat2, <i8 -128, i8 -128, i8 -128, i8 -128>
>     %2 = mul nuw <4 x i8> %broadcast.splat2, <i8 2, i8 2, i8 2, i8 2>                                  ; if %a == 200, this is poison...
>     %3 = xor <4 x i1> %1, <i1 true, i1 true, i1 true, i1 true>
>     %predphi = select <4 x i1> %3, <4 x i8> %broadcast.splat2, <4 x i8> %2                     ; ... even though the %predphi == %a broadcasted, it's still poison as it depends on %2 (according to https://llvm.org/docs/LangRef.html#poisonvalues)
>     %index.next = add i32 %index, 4
>     %4 = icmp eq i32 %index.next, 40
>     br i1 %4, label %middle.block, label %vector.body, !llvm.loop !0
>
>
> Do I miss anything important here that allows us not to drop "nuw" flags?

You are right, I stand corrected, conditional instructions that are executed speculatively need to drop their wrap flags as well. Good catch!
This is reminiscent of conditional instructions that are "assume", which need to be dropped altogether, D68814 <https://reviews.llvm.org/D68814>.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:394
+      Instruction *UI = cast<Instruction>(U);
+      if (L->contains(UI->getParent()) && Result.insert(UI).second)
+        Worklist.push_back(UI);
----------------
dantrushin wrote:
> Ayal wrote:
> > This is indeed a general way to record all transitively dependent instructions inside a loop. In our case, though, there's a single known `LoopExitInst `with a single (loop-closed phi) user outside the loop. More efficient to record that user and check if (UI != OutsideUser && Result.insert(UI).second) than to repeatedly check if parent block belongs to L. Agreed?
> IMHO this makes code uglier. See below
Perhaps offloading the code that searches for OutsideUser over to collectReductionInstructions() may look better, emphasizing that it's an implementation optimization internal to the collection process. That would require passing the loop again and also LoopExitInst.

Another, orthogonal option may be to fuse the loop collecting reduction instructions with the one dropping the wraps, eliminating the need for a ReductionInstructions set.

These options are more a matter of style, as you prefer. The current version and also the previous general one look good to me; we're not expecting too many UI instructions.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3730

   // Fix the vector-loop phi.

----------------
The above comment belongs with the code below that fixes the vector-loop phi.
Instead, a comment about fixing/dropping wraps can be made here; e.g., can move the "Wrap flags are in general..." comment here.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3742
+      }
+    }
+    SmallPtrSet<Instruction *, 4> ReductionInstructions;
----------------
Would be good to assert here that OuterUser is not nullptr, or even better that its a (loop closed) phi.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69563/new/

https://reviews.llvm.org/D69563