[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

Fri Apr 16 02:47:23 PDT 2021

DavidSpickett added a comment.

Both clang and GCC have their issues when it comes to matching the ACLE, so I wouldn't take the header guard as fact. It could be that we never implemented the A32 path for these functions/when they were added the document was in flux/no on ever tried this on A32.

I think you could implement `vadd_p8` on A32 with:

  veor.u8 d0, d0, d0

I think we just show A64 versions in the documentation. I think. Possible that what I've got above is a simd instruction but not an "advanced simd" instruction and that somehow doesn't count?

(caveat: I've mostly been making sure the function prototypes match the ACLE, not actually using these to do real work)

If I bodge the header to have vadd_p8 on Arm I get:

  $ cat /tmp/test.c
  #include <arm_neon.h>

  poly8x8_t test_vadd_p8(poly8x8_t a, poly8x8_t b) {
      return vadd_p8 (a, b);
  }
  $ ./bin/clang -target arm-arm-none-eabi -mcpu=cortex-a57 -S -o - /tmp/test.c -O3
  <...>
  test_vadd_p8:
          .fnstart
          vmov    d16, r0, r1
          vmov    d17, r2, r3
          veor    d16, d17, d16
          vmov    r0, r1, d16
          bx      lr

Which seems to confirm but I don't know why it's put behind the `__aarch64__` guard.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499