<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - llvm-mca for cortex-a57 gets thrown off by SIMD loads with dependencies (negative latency?)"

   href="https://bugs.llvm.org/show_bug.cgi?id=49499">49499</a>

          </td>

        </tr>


        <tr>

          <th>Summary</th>

          <td>llvm-mca for cortex-a57 gets thrown off by SIMD loads with dependencies (negative latency?)

          </td>

        </tr>


        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>


        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>


        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>


        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>


        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>


        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>


        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>


        <tr>

          <th>Component</th>

          <td>Backend: AArch64

          </td>

        </tr>


        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>


        <tr>

          <th>Reporter</th>

          <td>martin@martin.st

          </td>

        </tr>


        <tr>

          <th>CC</th>

          <td>andrea.dibiagio@gmail.com, arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Given the following instruction series, llvm-mca seems to calculate a sensible

result:


$ cat test.S

    add  v0.16b,  v1.16b, v2.16b

    add  v1.16b,  v3.16b, v0.16b

    add  v2.16b,  v3.16b, v1.16b

$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test.S

Iterations:        100

Instructions:      300

Total Cycles:      903


However if the series is preceded by a SIMD load into the registers that are

used, the total cycle count ends up reduced:


$ cat test2.S

    ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0]

    add  v0.16b,  v1.16b, v2.16b

    add  v1.16b,  v3.16b, v0.16b

    add  v2.16b,  v3.16b, v1.16b

$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test2.S

Iterations:        100

Instructions:      400

Total Cycles:      416


Suddenly the total cycles has dropped in half  as if the load has negative

latency. If the load is into a different set of registers, it doesn't affect

the calculation in the same way:


$ cat test3.S

    ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x0]

    add  v0.16b,  v1.16b, v2.16b

    add  v1.16b,  v3.16b, v0.16b

    add  v2.16b,  v3.16b, v1.16b

$ llvm-mca --mtriple=aarch64-linux-gnu --mcpu=cortex-a57 test3.S

Iterations:        100

Instructions:      400

Total Cycles:      904


This doesn't seem to happen for the A55 target though.</pre>

        </div>

      </p>


      <hr>

      <span>You are receiving this mail because:</span>


      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>