[llvm-bugs] [Bug 38697] New: Vectorization causes unsafe integer div-by-zero

Fri Aug 24 19:27:57 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38697

            Bug ID: 38697
           Summary: Vectorization causes unsafe integer div-by-zero
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: warren_ristow at playstation.sony.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 20766
  --> https://bugs.llvm.org/attachment.cgi?id=20766&action=edit
test-case

The attached test-program is miscompiled when compiled with optimization
(eg -O2) when appropriate vector extensions are enabled (eg sse4.1).  This
results in a run-time crash, due to an integer div-by-0.  It's caused by what
appears to be a bug in vectorization.

For example, using a fairly modern trunk version:

  $ clang --version
  clang version 8.0.0 (trunk 340550)
  Target: x86_64-unknown-linux-gnu
  Thread model: posix
  ...
  $ clang -O2 -msse4.1 -fno-vectorize -o testPass.elf test.c
  $ clang -O2 -msse4.1 -o testFail.elf test.c
  $ testPass.elf
  {BEGIN}
  {END}
  $ testFail.elf
  {BEGIN}
  Floating point exception (core dumped)
  $

I'm listing the program below, for ease of reference, to explain some
background.

The code contains a function 'test()':

  int test(int val, int signedshift, int bits) { .... }

'test()' contains a loop, and that loop will be an infinite loop if 'bits' is 0
(and I think it is undefined behavior if 'bits' is less than 0).  The caller
('main()', in this test-case) guards the call to 'test()', to only make the
call when that arg is known to be greater than 0.  The caller has an outer
loop, and nested within it is a loop that's called when that arg is <= 0, and
a different loop that's called when the arg is > 0 (and this is the loop
that calls 'test()').  That is, 'main()' has the structure:

  count = <expression>;
  for (....) {  // outer loop
    if (count <= 0) {
      for (....) {  // first inner loop
        // some code
      }
    } else {
      for (....) {  // second inner loop
        // some other code
        ... test(v, shift, count);  // loop in 'test()' gets vectorized
      }
    }
  }

When vectorization is done on the loop in 'test()', an integer division that's
loop-invariant is created (the divisor is 'bits', which is 0 at run-time for
the case here).  The function 'test()' is inlined into 'main()', and that
loop-invariant division is hoisted higher than it can safely be hoisted.
Specifically, it's hoisted to the pre-header of the outer loop of 'main()',
which causes a trap before entering the body of the outer loop (since 'count'
is 0).  More details can be seen in the full code of "test.c", pasted below.

FTR, bisecting, I see this problem first appeared with r284939.  However, I
think it's clear that that's a proper/correct change.  Instead, it's just
exposing a problem in the vectorizer that was latent.  Specifically, that
change is:

  ------------------------------------------------------------------------
  r284939 | rksimon | 2016-10-23 09:49:04 -0700 (Sun, 23 Oct 2016) | 3 lines
  Changed paths:
     M /llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
     M /llvm/trunk/test/Analysis/CostModel/X86/vshift-ashr-cost.ll
     M /llvm/trunk/test/Analysis/CostModel/X86/vshift-lshr-cost.ll
     M /llvm/trunk/test/Analysis/CostModel/X86/vshift-shl-cost.ll

  [X86][SSE] Add SSE41/AVX1 costs for vector shifts.

  We were defaulting to SSE2 costs which weren't taking into account the
  availability of PBLENDW/PBLENDVB to improve merging of per-element shift
  results.
  ------------------------------------------------------------------------

//////////////////////// test.c ////////////////////////
extern int printf(const char *, ...);
volatile int zero = 0;  // volatile to suppress optimization
static int get_data() { return zero; }  // some arbitrary data -- 0 works fine
unsigned char space[64];

int test(int val, int signedshift, int bits)
{
  int result, tmp;
  if (signedshift < 0) val <<= -signedshift;
  else if (signedshift > 0) val >>= signedshift;
  result = val;
  tmp = bits;

// This loop is dead-code at run-time, but preventing vectorization of it
// suppresses the problem.
// #pragma clang loop vectorize(disable)
  while (tmp < 8) {
    result += val >> tmp;
    tmp += bits;
  }

  return result;
}

int main()
{
  // References to (volatile) 'zero' here to suppress constant propagation.
  int xlim = 4 + zero;
  int ylim = 1 + zero;
  int shift = zero;
  int count = zero;
  int inx = 0;
  printf("{BEGIN}\n");
  for (int y = 0; y < ylim; ++y) {
    if (count <= 0) { // At runtime, 'count' is 0, so we enter here.
      for (int x = 0; x < xlim; ++x)
        space[inx++] = (unsigned char) get_data();
    } else { // At runtime, 'count' is 0, so this branch is dead.
      for (int x = 0; x < xlim; ++x) {
        int v = get_data();
        space[inx++] = (unsigned char) test(v, shift, count);
      }
    }
  }
  printf("{END}\n");
  return 0;
}
////////////////////////////////////////////////////////

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180825/82f194bc/attachment-0001.html>