[LLVMbugs] [Bug 22369] New: LLVM generating inefficient code for simple loops

Wed Jan 28 03:50:05 PST 2015

http://llvm.org/bugs/show_bug.cgi?id=22369

            Bug ID: 22369
           Summary: LLVM generating inefficient code for simple loops
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: djasper at google.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Consider the function:

uint8 *f(uint32 value, uint8 *target) {
  while (value >= 0x80) {
    *target = static_cast<uint8>(value | 0x80);
    value >>= 7;
    ++target;
  }
  *target = static_cast<uint8>(value);
  return target + 1;
}

For, this function, "clang -O2 -S" generates:

        cmpl    $128, %edi
        jb      .LBB0_1
        .align  16, 0x90
.LBB0_2:                                # %while.body
                                        # =>This Inner Loop Header: Depth=1
        movl    %edi, %eax
        orl     $128, %eax
        movb    %al, (%rsi)
        movl    %edi, %eax
        shrl    $7, %eax
        incq    %rsi
        cmpl    $16383, %edi            # imm = 0x3FFF
        movl    %eax, %edi
        ja      .LBB0_2
        jmp     .LBB0_3
.LBB0_1:
        movl    %edi, %eax
.LBB0_3:                                # %while.end
        movb    %al, (%rsi)
        incq    %rsi
        movq    %rsi, %rax
        retq

Which seems quite inefficient. There are several unnecessary moves. GCC instead
translates this to:

        jmp     .L4
        .p2align 4,,10
        .p2align 3
.L3:
        movl    %edi, %eax
        addq    $1, %rsi
        shrl    $7, %edi
        orl     $-128, %eax
        movb    %al, -1(%rsi)
.L4:
        cmpl    $127, %edi
        ja      .L3
        movb    %dil, (%rsi)
        leaq    1(%rsi), %rax
        ret

The LLVM IR generated by clang here already generates code that is very similar
to the final output (with instcombine folding the shift into the compare). If I
prevent this folding by adding "if (!Shr->hasOneUse()) return nullptr;" in
InstCombiner::FoldICmpShrCst()
(lib/Transforms/InstCombine/InstCombineCompares.cpp), then LLVM's code gets
somewhat better, but I suspect that LLVM should really be able to lower the IR
better.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150128/eef9cf7d/attachment.html>