<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - LLVM generates terrible x86 code for trivial, fully unrolled loops"
   href="https://llvm.org/bugs/show_bug.cgi?id=28090">28090</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>LLVM generates terrible x86 code for trivial, fully unrolled loops
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>chandlerc@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Consider this code:

----
struct V {
  static constexpr int length = 32;
  unsigned short data[32];
};

int reduce(V &v) {
  int sum = 0;
  for (int i = 0; i < v.length; ++i) {
    sum += static_cast<int>(v.data[i]);
  }
  return sum;
}
----

If the length weren't a constant, LLVM would do a delightful job of vectorizing
the reduction loop. But because it happens to be a constant trip count, we
fully unroll the loop and generate this mess:

----
% ./bin/clang++ -std=c++1z -c -S -o - -O2 -march=haswell x.cpp
        .text
        .file   "x.cpp"
        .globl  _Z6reduceR1V
        .p2align        4, 0x90
        .type   _Z6reduceR1V,@function
_Z6reduceR1V:                           # @_Z6reduceR1V
        .cfi_startproc
# BB#0:                                 # %entry
        movzwl  (%rdi), %eax
        movzwl  2(%rdi), %ecx
        addl    %eax, %ecx
        movzwl  4(%rdi), %eax
        addl    %ecx, %eax
        ....
        ; repeat OVER AND OVER AGAIN with minor variations in registers...
        ....
        movzwl  60(%rdi), %edx
        addl    %ecx, %edx
        movzwl  62(%rdi), %eax
        addl    %edx, %eax
        retq
----

Ow. This hurts code size as well. =/ I figure we need reduction support in the
SLP vectorizer or some such?</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>