<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - LLVM generating inefficient code for simple loops"

   href="http://llvm.org/bugs/show_bug.cgi?id=22369">22369</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>LLVM generating inefficient code for simple loops

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Loop Optimizer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>djasper@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Consider the function:

uint8 *f(uint32 value, uint8 *target) {

  while (value >= 0x80) {

    *target = static_cast<uint8>(value | 0x80);

    value >>= 7;

    ++target;

  }

  *target = static_cast<uint8>(value);

  return target + 1;

}

For, this function, "clang -O2 -S" generates:

        cmpl    $128, %edi

        jb      .LBB0_1

        .align  16, 0x90

.LBB0_2:                                # %while.body

                                        # =>This Inner Loop Header: Depth=1

        movl    %edi, %eax

        orl     $128, %eax

        movb    %al, (%rsi)

        movl    %edi, %eax

        shrl    $7, %eax

        incq    %rsi

        cmpl    $16383, %edi            # imm = 0x3FFF

        movl    %eax, %edi

        ja      .LBB0_2

        jmp     .LBB0_3

.LBB0_1:

        movl    %edi, %eax

.LBB0_3:                                # %while.end

        movb    %al, (%rsi)

        incq    %rsi

        movq    %rsi, %rax

        retq

Which seems quite inefficient. There are several unnecessary moves. GCC instead

translates this to:

        jmp     .L4

        .p2align 4,,10

        .p2align 3

.L3:

        movl    %edi, %eax

        addq    $1, %rsi

        shrl    $7, %edi

        orl     $-128, %eax

        movb    %al, -1(%rsi)

.L4:

        cmpl    $127, %edi

        ja      .L3

        movb    %dil, (%rsi)

        leaq    1(%rsi), %rax

        ret

The LLVM IR generated by clang here already generates code that is very similar

to the final output (with instcombine folding the shift into the compare). If I

prevent this folding by adding "if (!Shr->hasOneUse()) return nullptr;" in

InstCombiner::FoldICmpShrCst()

(lib/Transforms/InstCombine/InstCombineCompares.cpp), then LLVM's code gets

somewhat better, but I suspect that LLVM should really be able to lower the IR

better.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>