[llvm-bugs] [Bug 50171] New: Missed optimization to remove unnecessary branch from loop entry

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Apr 29 09:20:59 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=50171

            Bug ID: 50171
           Summary: Missed optimization to remove unnecessary branch from
                    loop entry
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: scovich at gmail.com
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

The following toy example:

  void Loop(int len) {
      int i = 0;
      const int kUnrollFactor = 8;
      for (int num_calls = 0; i <= len - kUnrollFactor; ) {
          if (num_calls + kUnrollFactor > 100) {
              extern void Foo(); Foo();
              num_calls = 0;
          }

          for (int j = 0; j < kUnrollFactor; j++, i++, num_calls++) {
              extern void Bar(int); Bar(i);
          }
      }
  }

Compiles to the following x86 assembly code with clang-9:

  Loop(int):
          ... prolog ...
        mov     r14d, edi
        add     r14d, -8
        js      .LBB0_5
        xor     r15d, r15d         <=== num_calls = 0
        xor     ebx, ebx
        cmp     r15d, 93           <=== num_calls still zero here
        jge     .LBB0_3            <=== branch can NEVER be taken
  .LBB0_4:
          ... unrolled loop body with 8 calls to Bar() ...
        add     r15d, 8
        add     ebx, 8
        cmp     ebp, r14d
        jge     .LBB0_5
        cmp     r15d, 93
        jl      .LBB0_4
  .LBB0_3:
        call    Foo()
        xor     r15d, r15d
        jmp     .LBB0_4
  .LBB0_5:
          ... epilogue ...
        ret

Ideally, the compiler should elide the provably redundant cmp+jge pair, leaving
only the xor:

        xor     r15d, r15d
        cmp     r15d, 93
        jge     .LBB0_3


With clang-12, the result is arguably worse, because the extra branching masks
the missed opportunity altogether:

  Loop(int):
          ... prologue ...
        cmp     edi, 8
        jge     .LBB0_1
  .LBB0_5:
          ... epilogue ...
        ret
  .LBB0_1:
          ... loop initialization ...
        xor     r15d, r15d          <=== num_calls = 0
        xor     ebx, ebx
        jmp     .LBB0_2
  .LBB0_4:
          ... unrolled loop body with 8 calls to Bar() ...
        add     r15d, 8
        add     ebx, 8
        cmp     ebp, r14d
        jge     .LBB0_5
  .LBB0_2:
        cmp     r15d, 93            <=== num_calls = 0 the first time
        jl      .LBB0_4             <=== branch never taken the first time
        call    Foo()
        xor     r15d, r15d
        jmp     .LBB0_4

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210429/25133b4b/attachment.html>


More information about the llvm-bugs mailing list