[llvm-bugs] [Bug 49380] New: Add pragma for loop aligning
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Mar 1 07:49:38 PST 2021
https://bugs.llvm.org/show_bug.cgi?id=49380
Bug ID: 49380
Summary: Add pragma for loop aligning
Product: clang
Version: unspecified
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: C
Assignee: unassignedclangbugs at nondot.org
Reporter: ibogosavljevic at gmail.com
CC: blitzrakete at gmail.com, dgregor at apple.com,
erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
richard-llvm at metafoo.co.uk
On modern x86 CPUs, the performance of the loop can vary up to 50% if the loop
instructions are not aligned properly. Aligning the loop start on a 32-byte
boundary can help get the best performance out of my critical loop, but there
is no guarantee that CLANG will align the loops properly.
Workarounds: Putting assembler nops or assembler align directives doesn't help,
because the actual assembler loop can have a header of assembly instructions.
This issue is a very well known one, e.g.:
https://stackoverflow.com/questions/45298870/why-does-loop-alignment-on-32-byte-make-code-faster
Repro: I can provide a working example where the same code has different speeds
depending on the loop alignment.
Intel's compiler already has a similar pragma:
https://software.intel.com/content/www/us/en/develop/articles/intelr-compiler-170-new-feature-code-alignment-for-loops.html
E.g. pragma
for (int i = 0; i < len; i++) {
int min = a[i];
int min_index = i;
#pragma clang loop code_align(32)
for (int j = i+1; j < len; j++) {
if (a[j] < min) {
min = a[j];
min_index = j;
}
}
std::swap(a[i], a[min_index]);
}
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210301/63392d7d/attachment.html>
More information about the llvm-bugs
mailing list