[llvm-bugs] [Bug 41562] New: Loop optimizer aggressively unrolling and vectorizing loops despite small bound on trip count

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Apr 22 17:12:46 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=41562

            Bug ID: 41562
           Summary: Loop optimizer aggressively unrolling and vectorizing
                    loops despite small bound on trip count
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: fabiang at radgametools.com
                CC: llvm-bugs at lists.llvm.org

Here's a fairly straightforward adaptation of real-world code we noticed this
in. This is essentially undoing byte-wise delta coding.

-

#include <stdint.h>
#include <stddef.h>
#include <emmintrin.h>

void f(uint8_t *dest, const uint8_t *src, ptrdiff_t offs, size_t len)
{
    // note: len is always < 64 for this func
    // offset is allowed to be -8 (but not >=-7) so can't do more than 8 at
once
    while (len >= 8)
    {
        __m128i v0 = _mm_loadl_epi64((const __m128i *) src);
        __m128i v1 = _mm_loadl_epi64((const __m128i *) (dest + offs));
        __m128i sum = _mm_add_epi8(v0, v1);
        _mm_storel_epi64((__m128i *) dest, sum);
        src += 8;
        dest += 8;
        len -= 8;
    }

    // this loop gets extensively unrolled and vectorized
    // which produces tons of code and is entirely pointless since
    // len < 7 here.
    while (len--)
    {
        *dest = *src++ + dest[offs];
        dest++;
    }
}

-

https://godbolt.org/z/yZ4WHw

There are other paths that handle long lengths but they are not relevant here.

Current trunk produces a ton of code for this loop; this has been the case for
several major releases at least.

The final loop (starting in line 23) is strictly dominated by the loop above
it; therefore, !(len >= 8) == len < 8 should be inferrable. Extensive unrolling
or loop vectorization is pointless in this instance.

This is a simple example but we've run into numerous instances of this:
essentially, an optimized (and manually vectorized) loop that has a small tail
portion. Trying to vectorize these tail loops does not improve perf and causes
noticeable code bloat.

It's possible to work around this using manual loop optimizer pragmas but this
is rather noisy in the source code, especially in code bases that also need to
work with other compilers (and thus require #ifdefs around Clang-specific
pragmas).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190423/21bd3d1b/attachment-0001.html>


More information about the llvm-bugs mailing list