[PATCH] D69563: [LV] Strip wrap flags from vectorized reductions

Mon Dec 9 15:03:29 PST 2019

a.elovikov added a comment.

Probably not too much important because should be handled by the vector predicated instructions/intrinsics, but still

In D69563#1764483 <https://reviews.llvm.org/D69563#1764483>, @Ayal wrote:

> In D69563#1763331 <https://reviews.llvm.org/D69563#1763331>, @dantrushin wrote:
>
> > In D69563#1763159 <https://reviews.llvm.org/D69563#1763159>, @Ayal wrote:
> >
> > > Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.
> > >
> > > But this should apply to all such operations that participate in the vectorized part of the loop. Note that 
> > >  (1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and
> >
> >
> > Is there some existing API to find them all? Or I need to invite my own?
>
>
> AFAIK such an API does not currently exist.
>
> > Would not it be easier just to not copy wrap flags in widenInstruction() for all instructions [which I was shy to do initially :) ]  or it is too aggressive?
>
> Loosing all wrap flags would be too aggressive.

Why is it ok not to drop nuw here:

  define i8 @function0(i8 %a) {
  entry:
    br label %for.body

  for.body:
    %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
    %cmp5 = icmp ult i8 %a, 127
    br i1 %cmp5, label %if.then, label %if.end

  if.then:
    %mul = mul nuw i8 %a, 2
    br label %if.end

  if.end:
    %k = phi i8 [ %mul, %if.then ], [ %a, %for.body ]
    %indvars.iv.next = add i32 %indvars.iv, 1
    %cmp = icmp slt i32 %indvars.iv.next, 42
    br i1 %cmp, label %for.body, label %for.end

  for.end:
    ret i8 undef
  }

Vector code generated is

  vector.ph:                                        ; preds = %entry
    %broadcast.splatinsert1 = insertelement <4 x i8> undef, i8 %a, i32 0
    %broadcast.splat2 = shufflevector <4 x i8> %broadcast.splatinsert1, <4 x i8> undef, <4 x i32> zeroinitializer
    br label %vector.body

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
    %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
    %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
    %0 = add i32 %index, 0
    %1 = icmp ult <4 x i8> %broadcast.splat2, <i8 -128, i8 -128, i8 -128, i8 -128>
    %2 = mul nuw <4 x i8> %broadcast.splat2, <i8 2, i8 2, i8 2, i8 2>                                  ; if %a == 200, this is poison...
    %3 = xor <4 x i1> %1, <i1 true, i1 true, i1 true, i1 true>
    %predphi = select <4 x i1> %3, <4 x i8> %broadcast.splat2, <4 x i8> %2                     ; ... even though the %predphi == %a broadcasted, it's still poison as it depends on %2 (according to https://llvm.org/docs/LangRef.html#poisonvalues)
    %index.next = add i32 %index, 4
    %4 = icmp eq i32 %index.next, 40
    br i1 %4, label %middle.block, label %vector.body, !llvm.loop !0

Do I miss anything important here that allows us not to drop "nuw" flags?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69563/new/

https://reviews.llvm.org/D69563