<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Loop optimizer aggressively unrolling and vectorizing loops despite small bound on trip count"

   href="https://bugs.llvm.org/show_bug.cgi?id=41562">41562</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Loop optimizer aggressively unrolling and vectorizing loops despite small bound on trip count

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Loop Optimizer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>fabiang@radgametools.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Here's a fairly straightforward adaptation of real-world code we noticed this

in. This is essentially undoing byte-wise delta coding.

-

#include <stdint.h>

#include <stddef.h>

#include <emmintrin.h>

void f(uint8_t *dest, const uint8_t *src, ptrdiff_t offs, size_t len)

{

    // note: len is always < 64 for this func

    // offset is allowed to be -8 (but not >=-7) so can't do more than 8 at

once

    while (len >= 8)

    {

        __m128i v0 = _mm_loadl_epi64((const __m128i *) src);

        __m128i v1 = _mm_loadl_epi64((const __m128i *) (dest + offs));

        __m128i sum = _mm_add_epi8(v0, v1);

        _mm_storel_epi64((__m128i *) dest, sum);

        src += 8;

        dest += 8;

        len -= 8;

    }

    // this loop gets extensively unrolled and vectorized

    // which produces tons of code and is entirely pointless since

    // len < 7 here.

    while (len--)

    {

        *dest = *src++ + dest[offs];

        dest++;

    }

}

-

<a href="https://godbolt.org/z/yZ4WHw">https://godbolt.org/z/yZ4WHw</a>

There are other paths that handle long lengths but they are not relevant here.

Current trunk produces a ton of code for this loop; this has been the case for

several major releases at least.

The final loop (starting in line 23) is strictly dominated by the loop above

it; therefore, !(len >= 8) == len < 8 should be inferrable. Extensive unrolling

or loop vectorization is pointless in this instance.

This is a simple example but we've run into numerous instances of this:

essentially, an optimized (and manually vectorized) loop that has a small tail

portion. Trying to vectorize these tail loops does not improve perf and causes

noticeable code bloat.

It's possible to work around this using manual loop optimizer pragmas but this

is rather noisy in the source code, especially in code bases that also need to

work with other compilers (and thus require #ifdefs around Clang-specific

pragmas).</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>