<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Loop Vectorizer Generates Unreachable Fast Path"
   href="https://bugs.llvm.org/show_bug.cgi?id=47371">47371</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Loop Vectorizer Generates Unreachable Fast Path
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>tools
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>llc
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>thoren.paulson@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Overview:
I found a simple example that the loop vectorizer tries to optimize, but
generates vector code that's unreachable due to contradicting conditions.

Steps to Reproduce:
C++ Example, compiled with `clang -O2` targeting x86_64, see
<a href="https://godbolt.org/z/5x9WTb">https://godbolt.org/z/5x9WTb</a>

```
#include <cstddef>

void pan(float* samples, size_t len, float coef) {
    float c[4] = { 1.0f - coef, coef, 1.0f - coef, coef };

    for (size_t i = 0; i < len; i++) {
        samples[i] *= c[i % 4];
    }
}
```

Actual results:
See <a href="https://godbolt.org/z/5x9WTb">https://godbolt.org/z/5x9WTb</a> or compile the example for assembly output,
but here's the part that seems contradictory:

```
test    rsi, rsi
je      .LBB0_14       # jump if len == 0
xor     eax, eax
cmp     rsi, 8
jb      .LBB0_3        # jump if len < 8
lea     rcx, [rsi - 1]
cmp     rcx, 4
jae     .LBB0_3        # jump if len >= 5
```

`.LBB0_3` is the non-vectorized clean up path, and `rsi` hold `len`. This code
will always go to the cleanup loop instead of falling through to the vectorized
code, since `len` cannot be greater than 8 but less than 5.

Expected results:
Any `len` greater than 8 (or some other appropriate value) should execute the
`mulps` instructions until the remainder is less than the vectorization width.


Additional information:
I originally discovered this in Rust in this form: <a href="https://godbolt.org/z/6M7Pon">https://godbolt.org/z/6M7Pon</a>
This leads me to believe its a backend issue and not clang and rustc.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>