[llvm-bugs] [Bug 45980] New: Case where loop idiom recognition causes 2x slowdown
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon May 18 11:08:45 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=45980
Bug ID: 45980
Summary: Case where loop idiom recognition causes 2x slowdown
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedbugs at nondot.org
Reporter: andrew.b.adams at gmail.com
CC: llvm-bugs at lists.llvm.org
Created attachment 23504
--> https://bugs.llvm.org/attachment.cgi?id=23504&action=edit
repro
Loops with a very short but dynamic trip count (either 1 or 2 in the example
below) are converted to memcpy calls at -O3, which are about 2x slower than
just running the loop.
void bad(__m256 *__restrict a, __m256 *__restrict b, bool condition) {
for (int j = 0; j < 1000; j++) {
int s = condition ? 1 : 2;
for (int i = 0; i < s; i++) {
a[j * 2 + i] = b[j * 2 + i];
}
}
}
void good(__m256 *__restrict a, __m256 *__restrict b, bool condition) {
for (int j = 0; j < 1000; j++) {
int s = condition ? 1 : 2;
for (int i = 0; i < s; i++) {
asm volatile(""); // To prevent loop idiom recognition
a[j * 2 + i] = b[j * 2 + i];
}
}
}
This is particularly bad with avx512 when there's math going on in the outer
loop, because the call to memcpy also requires a vzeroupper, which spills
everything to the stack.
As an aside, it would be nice if PipelineTuningOptions, which currently allows
for turning off vectorization, unrolling, and a variety of other loop
optimizations, also enabled turning off loop idiom recognition. It's not
appropriate for every front-end language and there doesn't seem to be a clean
way to turn it off from the API.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200518/f8b856bd/attachment.html>
More information about the llvm-bugs
mailing list