[llvm-bugs] [Bug 44547] New: Inefficient codegen for remainder loop when vectorizing by factor of 2 (possibly more)
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Jan 14 09:13:45 PST 2020
https://bugs.llvm.org/show_bug.cgi?id=44547
Bug ID: 44547
Summary: Inefficient codegen for remainder loop when
vectorizing by factor of 2 (possibly more)
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedbugs at nondot.org
Reporter: d.maljutin at yandex.ru
CC: llvm-bugs at lists.llvm.org
See motivating example: https://godbolt.org/z/vSfTT9
void test(const int16_t* __restrict a, const int16_t* __restrict b, int16_t*
__restrict c, uint32_t n) {
#pragma nounroll
#pragma clang loop vectorize_width(2) interleave_count(1)
for (int32_t i = 0; i < n; i++) {
*c++ = *a++ + *b++;
}
}
One would imagine that the compiler would essentially turn this into
{
if (n & 1) *c++ = *a++ + *b++;
for (int32_t i = 0; i < n<<1; i++) {
...
}
}
But it generates something like this instead:
{
for (int32_t i = 0; i < n<<1; i++) {
...
}
if (n & 1) for (int32_t i = 0; i < phi(1, (n & 1)); i++) *c++ = *a++ +
*b++;
}
Loop vectorizer seems to always generate remainder "block", as a loop even if
it has known constant tripcount (in this case it's 1!).
However, since this tripcount is hidden behind an "if" (or switch condition
after some opts), this "remainder loop with tripcount == 1" is never really
unrolled (because SCEV fails to compute it's tripcount). Sometimes running
additional GVN and IPSCCP passes helps, but this is not optimal.
I don't see why remainder has to be a loop in the first place (when trip count
is known).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200114/ee9eb8dd/attachment.html>
More information about the llvm-bugs
mailing list