<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Non-vectorised code slower on ARM"

   href="https://llvm.org/bugs/show_bug.cgi?id=26881">26881</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Non-vectorised code slower on ARM

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>3.8

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>Other

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: ARM

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>tulipawn@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>So I thought I'd see how well code from issue #26837 was doing on ARM and

here's the result on Cortex-A5:

running 2 tests

wo/NEON

test folds1 ... bench:       3,107 ns/iter (+/- 59)

test folds2 ... bench:       2,490 ns/iter (+/- 34)

w/NEON

test folds1 ... bench:       1,293 ns/iter (+/- 25)

test folds2 ... bench:       2,493 ns/iter (+/- 28)

Judging from those x86 results, there should be no difference between the two

versions in the absence of vector instructions' availability and yet the first

ARM result seems inverted. Probably shouldn't happen.

Flags used:

 -C target-cpu=cortex-a5 -C target-feature=+vfp4,-neon -C

llvm-args=-force-target-max-vector-interleave=4</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>