[llvm-bugs] [Bug 42900] New: loop unrolling fails to properly optimize the remainder

Tue Aug 6 07:36:52 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=42900

            Bug ID: 42900
           Summary: loop unrolling fails to properly optimize the
                    remainder
           Product: clang
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: mschwar42 at gmail.com
                CC: llvm-bugs at lists.llvm.org, neeilans at live.com,
                    richard-llvm at metafoo.co.uk

when telling clang to unroll a loop using the pragma unroll, one would expect
the remainder (i.e. N % unroll_factor) to be taken care of outside of the
unrolled loop. Instead clang decides to include this check in each chunk of the
unrolled loop.

int cmp(const char *s1, const char *s2, size_t n){
  unsigned char c1 = '\0';
  unsigned char c2 = '\0';
  #pragma unroll 4
  for(size_t i = 0; i < n; i++){
    c1 = (unsigned char) *s1++;
    c2 = (unsigned char) *s2++;
    if (c1 == '\0' || c1 != c2) return c1 - c2;
  }
  return c1 - c2;
}

produces chunks like this:
        movzx   r11d, byte ptr [rdi + rax]
        movzx   ecx, byte ptr [rsi + rax]
        test    r11b, r11b
        je      .LBB0_10
        cmp     r11b, cl
        jne     .LBB0_10
        cmp     r10, rax
        je      .LBB0_6

instead of this:

        movzx   eax, BYTE PTR [rdi+rcx]
        movzx   r8d, BYTE PTR [rsi+rcx]
        test    al, al
        je      .L32
        cmp     al, r8b
        jne     .L32

comparison between gcc and clang:
https://godbolt.org/z/ZTADbh

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190806/fbaff422/attachment.html>