[llvm-bugs] [Bug 41562] New: Loop optimizer aggressively unrolling and vectorizing loops despite small bound on trip count
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Apr 22 17:12:46 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=41562
Bug ID: 41562
Summary: Loop optimizer aggressively unrolling and vectorizing
loops despite small bound on trip count
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedbugs at nondot.org
Reporter: fabiang at radgametools.com
CC: llvm-bugs at lists.llvm.org
Here's a fairly straightforward adaptation of real-world code we noticed this
in. This is essentially undoing byte-wise delta coding.
-
#include <stdint.h>
#include <stddef.h>
#include <emmintrin.h>
void f(uint8_t *dest, const uint8_t *src, ptrdiff_t offs, size_t len)
{
// note: len is always < 64 for this func
// offset is allowed to be -8 (but not >=-7) so can't do more than 8 at
once
while (len >= 8)
{
__m128i v0 = _mm_loadl_epi64((const __m128i *) src);
__m128i v1 = _mm_loadl_epi64((const __m128i *) (dest + offs));
__m128i sum = _mm_add_epi8(v0, v1);
_mm_storel_epi64((__m128i *) dest, sum);
src += 8;
dest += 8;
len -= 8;
}
// this loop gets extensively unrolled and vectorized
// which produces tons of code and is entirely pointless since
// len < 7 here.
while (len--)
{
*dest = *src++ + dest[offs];
dest++;
}
}
-
https://godbolt.org/z/yZ4WHw
There are other paths that handle long lengths but they are not relevant here.
Current trunk produces a ton of code for this loop; this has been the case for
several major releases at least.
The final loop (starting in line 23) is strictly dominated by the loop above
it; therefore, !(len >= 8) == len < 8 should be inferrable. Extensive unrolling
or loop vectorization is pointless in this instance.
This is a simple example but we've run into numerous instances of this:
essentially, an optimized (and manually vectorized) loop that has a small tail
portion. Trying to vectorize these tail loops does not improve perf and causes
noticeable code bloat.
It's possible to work around this using manual loop optimizer pragmas but this
is rather noisy in the source code, especially in code bases that also need to
work with other compilers (and thus require #ifdefs around Clang-specific
pragmas).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190423/21bd3d1b/attachment-0001.html>
More information about the llvm-bugs
mailing list