<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - C++ NEON intrinsics code using arrays of NEON variables is compiled to inefficient code"

   href="https://bugs.llvm.org/show_bug.cgi?id=34945">34945</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>C++ NEON intrinsics code using arrays of NEON variables is compiled to inefficient code

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: AArch64

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>jacob.benoit.1@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>echristo@gmail.com, jan.wassenberg@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=19274" name="attach_19274" title="Testcase">attachment 19274</a> <a href="attachment.cgi?id=19274&action=edit" title="Testcase">[details]</a></span>

Testcase

At least with the LLVM 5.0 toolchain in Android NDK r15c (in fact with each

recent NDK LLVM I've tried), when compiling to Aarch64, C++ NEON intrinsics

code that uses arrays of NEON variables, like

```

#include <arm_neon.h>

int32x4_t foo[4];

// This for loop is unrolled by the compiler.

// Manually unrolling it does not make a difference.

for (int i = 0; i < 4; i++) do_something(foo[i]);

```

is slow; rewriting this code to declare separate variables instead of an array

makes it much faster, e.g.

```

#include <arm_neon.h>

int32x4_t foo0, foo1, foo2, foo3;

// Now we have no choice but to manually unroll this code,

// as we don't have our 4 variables nicely tucked into an array.

do_something(foo0);

do_something(foo1);

do_something(foo2);

do_something(foo3);

```

I learned that trick from Jan Wassenberg (CC'd). It seems very surprising that

this would make any difference at all.

Attaching a self-contained testcase. It's not a minimal testcase, but it allows

to quantify the impact of this bug on concrete production code

(<a href="https://github.com/google/gemmlowp/blob/master/standalone/neon-gemm-kernel-benchmark.cc">https://github.com/google/gemmlowp/blob/master/standalone/neon-gemm-kernel-benchmark.cc</a>),

and it should be trivial to extract a minimal testcase looking like the above

snippets from it, or write one from scratch.

Example compilation command line:

aarch64-linux-android-clang++ -fPIE -static --std=c++11 -O3 simd-testcase.cc -o

/tmp/x

Example outputs:

Pixel2 big cores, ARM Cortex-A73:

```

gemm_kernel_intrinsics_naive_using_arrays_of_neon_variables      14 Gop/s

gemm_kernel_intrinsics_fast_using_separate_neon_variables        21.8 Gop/s

gemm_kernel_inline_asm                                           26.8 Gop/s

```

Pixel2 little cores, ARM Cortex-A53:

```

gemm_kernel_intrinsics_naive_using_arrays_of_neon_variables      5.27 Gop/s

gemm_kernel_intrinsics_fast_using_separate_neon_variables        10.3 Gop/s

gemm_kernel_inline_asm                                           11.6 Gop/s

```</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>