<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [LoopVectorizer/SCEV] induction with truncation prevents vectorization. Need runtime overflow test."
   href="https://llvm.org/bugs/show_bug.cgi?id=30654">30654</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[LoopVectorizer/SCEV] induction with truncation prevents vectorization. Need runtime overflow test.
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dorit.nuzman@intel.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Saw this missed optimization in a Geekbench workload:
We have a signed int index ‘w_ix’ which is incremented by an unsigned long
‘step’.
When compiling with -m32 all is well.
However when compiling with -m64 the result of the ulong addition may not fit
in back into the sint index and so we may have sint overflow (the index may
wrap).

for.body:
  %w_ix.014 = phi i64 [ %add3, %for.body ], [ 0, %for.body.preheader ]
  %sext = shl i64 %w_ix.014, 32
  %idxprom = ashr exact i64 %sext, 32
  %add3 = add i64 %idxprom, %step

As a result the loop vectorizer fails with
“LV: PHI is not a poly recurrence… Found an unidentified PHI”.
"loop not vectorized: value that could not be identified as reduction is used
outside the loop."

In order to guarantee that the induction behaves nicely we need to identify
this pattern (addition with 64-to-32-bit truncation), and generate a runtime
sint overflow check (e.g. check that step*loopTripCount is small enough).

This is a reduced testcase:

#include <stdlib.h>
float in[1000];
float out[1000];
void test(size_t out_start, size_t size, size_t step)
{
    int w_ix = 0;
    for (size_t out_offset = 0; out_offset < size; ++out_offset)
    {
        size_t out_ix = out_start + out_offset;
        float w = in[w_ix];
        out[out_ix] += w;
        w_ix += step;
    }
}

(I compiled it with -m64 -Ofast -static -march=core-avx2 ).</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>