<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Inefficient unrolling in vectorization pass when VF==1"
   href="https://llvm.org/bugs/show_bug.cgi?id=23217">23217</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Inefficient unrolling in vectorization pass when VF==1
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wmi@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=14199" name="attach_14199" title="bad.s">attachment 14199</a> <a href="attachment.cgi?id=14199&action=edit" title="bad.s">[details]</a></span>
bad.s

We found the unrolling in loop vectorization pass when VF==1 was inefficient
when analyzing an internal benchmark. The simple testcase 1.c here is used to
show the problem:

1.c:
int a[1000], N;

void foo() {
  long i;
  for (i = 0; i < N; i++) {
    a[i*7] = 3;
  }
}

~/workarea/llvm-r234389/build/bin/clang -O2 -S 1.c

In loop vectorization pass, VF=1 and UF=2 are computed for the above loop.
Because VF==1, no vectorization will be done, but the loop will still be
unrolled by a factor of two. A remainder loop will be generated.

In loop unroll pass, the unrolled loop body will be unrolled another time by a
factor of two. The remainder loop will be unrolled by a factor of four. Two
extra loop prologues and a bunch of other checks will be generated. See the
bad.s attached.

If we disabled the unrolling in loop vectorization pass when VF==1, loop unroll
pass will do unrolling for the above loop by a factor of four all at once and
generate much less extra code like prologue and overflow checks. See the good.s
attached.

We experimentally disabled the unrolling in loop vectorization pass and saw the
internal benchmark improved 5% on sandybridge and 9% on westmere.

Google ref b/19469562</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>