<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Possible regression of vectorization of Linpack on ARM"
   href="http://llvm.org/bugs/show_bug.cgi?id=15247">15247</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Possible regression of vectorization of Linpack on ARM
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>renato.golin@linaro.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=10004" name="attach_10004" title="result of the runs with clang and GCC">attachment 10004</a> <a href="attachment.cgi?id=10004&action=edit" title="result of the runs with clang and GCC">[details]</a></span>
result of the runs with clang and GCC

Running Linpack from (<a href="http://www.netlib.org/benchmark/linpackc.new">http://www.netlib.org/benchmark/linpackc.new</a>) with clang
with and without vectorization, I get 40% of performance boost on Intel but -3%
on ARM.

Looking at the code, the intermixing of ARM and NEON code looks particularly
bad, so it seems some shuffles are not being lowered correctly or are being
generated on a pattern that the back-end instruction combine doesn't recognize.

Nadav said he got 40% boost on ARM too, so this could be a regression, but not
necessarily on the loop vectorizer, since the BB vectorizer uses the same cost
model and is known to abuse of shuffles.

Attached is the CSV with the result of the runs with clang and GCC. The
compilation command lines are below:

GCC:
$ gcc -O3 linpack.c

Clang Vect:
$ clang -O3 linpack.c

Clang No-vect:
$ clang -O3 -fno-vectorize linpack.c</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>