[llvm-bugs] [Bug 43828] New: nowrap flags are not always correct after vectorization

Mon Oct 28 06:41:28 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=43828

            Bug ID: 43828
           Summary: nowrap flags are not always correct after
                    vectorization
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: dantrushin at gmail.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 22738
  --> https://bugs.llvm.org/attachment.cgi?id=22738&action=edit
Test to demonstrate wrong vectorizer behavior

When widening instructions loop vectorize always copies IR flags (including
nowrap) from scalar instruction to new vector instruction.
But this is not always correct. Consider subtract reduction loop which 
is vectorized and interleaved.

outer_loop:
  %local_4 = phi i32 [ 2, %entry ], [ %4, %outer_tail]
  br label %inner_loop

inner_loop:
  %local_2 = phi i32 [ 0, %outer_loop ], [ %1, %inner_loop ]
  %local_3 = phi i32 [ -104, %outer_loop ], [ %0, %inner_loop ]
  %0 = sub nuw nsw i32 %local_3, %local_4
  %1 = add nuw nsw i32 %local_2, 1
  %2 = icmp ugt i32 %local_2, 126
  br i1 %2, label %outer_tail, label %inner_loop

outer_tail:
  %3 = phi i32 [ %0, %inner_loop ]
  %4 = add i32 %local_4, 1
  %5 = icmp slt i32 %4, 6
  br i1 %5, label %outer_loop, label %exit

Note nuw/nsw flags on sub instruction - they're correct for scalar code

after vectorization it becomes:

vector.ph:                                        ; preds = %outer_loop
  %broadcast.splatinsert3 = insertelement <4 x i32> undef, i32 %local_4, i32 0
  %broadcast.splat4 = shufflevector <4 x i32> %broadcast.splatinsert3, <4 x
i32> undef, <4 x i32> zeroinitializer
  br label %vector.body

vector.body:                          ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ <i32 -104, i32 0, i32 0, i32 0>, %vector.ph ], [
%2, %vector.body ]
  %vec.phi2 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %3, %vector.body
]
  %0 = sub nuw nsw <4 x i32> %vec.phi, %broadcast.splat4
  %1 = sub nuw nsw <4 x i32> %vec.phi2, %broadcast.splat4
  %index.next = add i32 %index, 8
  %2 = icmp eq i32 %index.next, 128
  br i1 %2, label %middle.block, label %vector.body, !llvm.loop !0

Note that %1 sub still has nuw flag set, but it is incorrect now.
Due to this flag, later optimizations remove second sub instruction
[ (0 - x)<nuw> -> 0 ] which results in incorrect code

Simple testcase is attached (unrolling vectorized loop makes it clearly
visible)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191028/d1d2e40b/attachment.html>