[llvm-bugs] [Bug 42900] New: loop unrolling fails to properly optimize the remainder
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Aug 6 07:36:52 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=42900
Bug ID: 42900
Summary: loop unrolling fails to properly optimize the
remainder
Product: clang
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: LLVM Codegen
Assignee: unassignedclangbugs at nondot.org
Reporter: mschwar42 at gmail.com
CC: llvm-bugs at lists.llvm.org, neeilans at live.com,
richard-llvm at metafoo.co.uk
when telling clang to unroll a loop using the pragma unroll, one would expect
the remainder (i.e. N % unroll_factor) to be taken care of outside of the
unrolled loop. Instead clang decides to include this check in each chunk of the
unrolled loop.
int cmp(const char *s1, const char *s2, size_t n){
unsigned char c1 = '\0';
unsigned char c2 = '\0';
#pragma unroll 4
for(size_t i = 0; i < n; i++){
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2) return c1 - c2;
}
return c1 - c2;
}
produces chunks like this:
movzx r11d, byte ptr [rdi + rax]
movzx ecx, byte ptr [rsi + rax]
test r11b, r11b
je .LBB0_10
cmp r11b, cl
jne .LBB0_10
cmp r10, rax
je .LBB0_6
instead of this:
movzx eax, BYTE PTR [rdi+rcx]
movzx r8d, BYTE PTR [rsi+rcx]
test al, al
je .L32
cmp al, r8b
jne .L32
comparison between gcc and clang:
https://godbolt.org/z/ZTADbh
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190806/fbaff422/attachment.html>
More information about the llvm-bugs
mailing list