[llvm-bugs] [Bug 40310] New: [SLP] slp-vectorizer-hor miscompiles unrolled &

Mon Jan 14 14:21:46 PST 2019

https://bugs.llvm.org/show_bug.cgi?id=40310

            Bug ID: 40310
           Summary: [SLP] slp-vectorizer-hor miscompiles unrolled &
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: fedor.v.sergeev at gmail.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 21322
  --> https://bugs.llvm.org/attachment.cgi?id=21322&action=edit
IR that demonstrates slp-vectorizer miscompile

Attached/cited IR is a manually reduced result of optimizing a Java code:
---
void mainTest(int param, int values[], int len) {
    int mask = 31;
    for (int i = 0; i < len; i++) {
      values[i] = param--;
      for (int ix = 0; ix < 16; ix++)
          param &= mask++;
    }
}
---

] cat before-slp.ll
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"                
target triple = "x86_64-unknown-linux-gnu"
define void @mainTest(i32 %param, i8* align 8 %vals, i32 %len) {
bci_15.preheader:
  br label %bci_15
bci_15:
  %local_0_ = phi i32 [ %v43, %bci_15 ], [ %param, %bci_15.preheader ]
  %local_4_ = phi i32 [ %v44, %bci_15 ], [ 31, %bci_15.preheader ]
  %v12 = add i32 %local_0_, -1
  %v13 = add i32 %local_4_, 1
  %v14 = and i32 %local_4_, %v12
  %v15 = add i32 %local_4_, 2
  %v16 = and i32 %v13, %v14
  %v17 = add i32 %local_4_, 3
  %v18 = and i32 %v15, %v16
  %v19 = add i32 %local_4_, 4
  %v20 = and i32 %v17, %v18
  %v21 = add i32 %local_4_, 5
  %v22 = and i32 %v19, %v20
  %v23 = add i32 %local_4_, 6
  %v24 = and i32 %v21, %v22
  %v25 = add i32 %local_4_, 7
  %v26 = and i32 %v23, %v24
  %v27 = add i32 %local_4_, 8
  %v28 = and i32 %v25, %v26
  %v29 = add i32 %local_4_, 9
  %v30 = and i32 %v27, %v28
  %v31 = add i32 %local_4_, 10
  %v32 = and i32 %v29, %v30
  %v33 = add i32 %local_4_, 11
  %v34 = and i32 %v31, %v32
  %v35 = add i32 %local_4_, 12
  %v36 = and i32 %v33, %v34
  %v37 = add i32 %local_4_, 13
  %v38 = and i32 %v35, %v36
  %v39 = add i32 %local_4_, 14
  %v40 = and i32 %v37, %v38
  %v41 = add i32 %local_4_, 15
  %v42 = and i32 %v39, %v40
  %v43 = and i32 %v41, %v42
  %v44 = add i32 %local_4_, 16
  br i1 true, label %bci_15, label %loopexit
loopexit:
  ret void
}
] 

These add's and and's are a result of unrolled inner loop that does param =
&mask. Unfortunately, when horizontal reduction is applied it starts spitting
out undef-s:

] bin/opt -slp-vectorizer before-slp.ll -S | grep -c undef
23
]

And these undefs essentially lead to 'param' calculation to be undef'ed (and
become 0 in my case). This is a miscompile, and is rather subtle and hard to
track :(

If horizontal reduction is disabled there are no undefs and the code is
semantically correct. (also my original Java test starts passing, so I'm pretty
sure that my miscompile is caused by this).

] bin/opt -slp-vectorizer before-slp.ll -slp-vectorize-hor=false -S | grep -c
undef
0
]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190114/7ade8a71/attachment-0001.html>