[llvm-dev] LoopVectorizer and nowrap flags

Denis Antrushin via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 24 10:27:43 PDT 2019


Hi,

I ran into a problem which I think is caused by loop vectorizer incorrectly
copying nowrap flags from scalar instructions to vector ones.

Consider this testcase:

===================================================================
; RUN: opt --loop-vectorize --loop-unroll %s
define void @test(i32* %B) {
entry:
   br label %outer_loop

outer_loop:
   %local_4 = phi i32 [ 2, %entry ], [ %4, %outer_tail]
   br label %inner_loop

inner_loop:
   %local_2 = phi i32 [    0, %outer_loop ], [ %1, %inner_loop ]
   %local_3 = phi i32 [ -104, %outer_loop ], [ %0, %inner_loop ] ; {-104, -, %local_4}
   %0 = sub nuw nsw i32 %local_3, %local_4			; nuw is correct here
   %1 = add nuw nsw i32 %local_2, 1
   %2 = icmp ugt i32 %local_2, 126
   br i1 %2, label %outer_tail, label %inner_loop

outer_tail:
   %3 = phi i32 [ %0, %inner_loop ]
   store atomic i32 %3, i32 * %B unordered, align 8
   %4 = add i32 %local_4, 1
   %5 = icmp slt i32 %4, 6
   br i1 %5, label %outer_loop, label %exit

exit:
   ret void
}
===================================================================

Note nuw/nsw flags set on '%0 = sub ... ' instruction.
They look valid.

After vectorization I have:

===================================================================
vector.body:                                      ; preds = %vector.body, %vector.ph
   %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
   %vec.phi = phi <4 x i32> [ <i32 -104, i32 0, i32 0, i32 0>, %vector.ph ], [ %2, %vector.body ]
   %vec.phi2 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %3, %vector.body ]
   %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
   %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
   %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
   %induction1 = add <4 x i32> %broadcast.splat, <i32 4, i32 5, i32 6, i32 7>
   %0 = add i32 %index, 0
   %1 = add i32 %index, 4
   %2 = sub nuw nsw <4 x i32> %vec.phi, %broadcast.splat4
   %3 = sub nuw nsw <4 x i32> %vec.phi2, %broadcast.splat6    // nuw present but does not seem valid
   %index.next = add i32 %index, 8
   %4 = icmp eq i32 %index.next, 128
   br i1 %4, label %middle.block, label %vector.body, !llvm.loop !0

===================================================================

Note that '%3 = sub ...' still has nuw set, but it looks wrong, because it starts from 0 and subtracts positive value

And when loop unrolling runs after vectorizer, its thinks that (0 - x)<nuw> is no-op and removes it, leaving only
first half:

===================================================================
vector.body:                                      ; preds = %vector.ph
   %0 = sub nuw nsw <4 x i32> <i32 -104, i32 0, i32 0, i32 0>, <i32 2, i32 2, i32 2, i32 2>
   %1 = sub nuw nsw <4 x i32> %0, <i32 2, i32 2, i32 2, i32 2>
   %2 = sub nuw nsw <4 x i32> %1, <i32 2, i32 2, i32 2, i32 2>
   %3 = sub nuw nsw <4 x i32> %2, <i32 2, i32 2, i32 2, i32 2>
   %4 = sub nuw nsw <4 x i32> %3, <i32 2, i32 2, i32 2, i32 2>
   %5 = sub nuw nsw <4 x i32> %4, <i32 2, i32 2, i32 2, i32 2>
   %6 = sub nuw nsw <4 x i32> %5, <i32 2, i32 2, i32 2, i32 2>
   %7 = sub nuw nsw <4 x i32> %6, <i32 2, i32 2, i32 2, i32 2>
   %8 = sub nuw nsw <4 x i32> %7, <i32 2, i32 2, i32 2, i32 2>
   %9 = sub nuw nsw <4 x i32> %8, <i32 2, i32 2, i32 2, i32 2>
   %10 = sub nuw nsw <4 x i32> %9, <i32 2, i32 2, i32 2, i32 2>
   %11 = sub nuw nsw <4 x i32> %10, <i32 2, i32 2, i32 2, i32 2>
   %12 = sub nuw nsw <4 x i32> %11, <i32 2, i32 2, i32 2, i32 2>
   %13 = sub nuw nsw <4 x i32> %12, <i32 2, i32 2, i32 2, i32 2>
   %14 = sub nuw nsw <4 x i32> %13, <i32 2, i32 2, i32 2, i32 2>
   %15 = sub nuw nsw <4 x i32> %14, <i32 2, i32 2, i32 2, i32 2>
   %rdx.shuf = shufflevector <4 x i32> %15, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
   %bin.rdx7 = add <4 x i32> %15, %rdx.shuf
   %rdx.shuf8 = shufflevector <4 x i32> %bin.rdx7, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
   %bin.rdx9 = add <4 x i32> %bin.rdx7, %rdx.shuf8
   %16 = extractelement <4 x i32> %bin.rdx9, i32 0
   br i1 true, label %outer_tail, label %scalar.ph
===================================================================

What's the proper way to handle this problem?
For now I disabled nowrap flag propagation in InnerLoopVectorizer::widenInstruction,
but it does not look like a correct fix

Thanks,
   Denis



More information about the llvm-dev mailing list