[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

Fri Apr 16 04:05:55 PDT 2021

DavidSpickett added a comment.

Ok, it's behind the aarch64 guard because there's a giant #ifdef around it that I didn't see. If you move it in `arm_neon.td` down to outside that #ifdef you get the definitions. Next problem is that the `poly128_t` type isn't implemented for AArch32. Not sure why but for example in the bf16 tests:

  clang/test/CodeGen/arm-bf16-reinterpret-intrinsics.c:
  // TODO: poly128_t not implemented on aarch32
  // CHCK-LABEL: @test_vreinterpretq_p128_bf16

If you split the def in two, one for the non Q, one for the Q parts that can work and you get the non poly128_t definitions for Arm and then I can compile my test program without modifications to the header.

  --- a/clang/include/clang/Basic/arm_neon.td
  +++ b/clang/include/clang/Basic/arm_neon.td
  @@ -1160,7 +1160,8 @@ def SM4E : SInst<"vsm4e", "...", "QUi">;
   def SM4EKEY : SInst<"vsm4ekey", "...", "QUi">;
   }

  -def VADDP   : WInst<"vadd", "...", "PcPsPlQPcQPsQPlQPk">;
  +// TODO: poly128_t isn't implemented for AArch32
  +def VADDP   : WInst<"vadd", "...", "QPcQPsQPlQPk">;

   ////////////////////////////////////////////////////////////////////////////////
   // Float -> Int conversions with explicit rounding mode
  @@ -1630,6 +1631,9 @@ def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "1QI", "ScSsSiSlSfSdSUcSUsSUiSUlSPcS
   }
   }

  +// Everything that doesn't use poly128_t
  +def VADDP_Q   : WInst<"vadd", "...", "PcPsPl">;
  +
   // ARMv8.2-A FP16 vector intrinsics for A32/A64.
   let ArchGuard = "defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)" in {

However I ran into issues trying to get the relevant bits of aarc64-poly-add.c to compile for Arm and don't have time time to pursue it myself. If you can get those to run (or make an arm specific test file, some others do already) then all you'd need to do is remove the `vaddq_p128`  in `CGBuiltin.cpp` to prevent anyone thinking it is implemented for AArch32. (no one would be able to hit it without a modified header anyway)

My conclusion being that this group of intrinsics should be for A32 and A64 as the ACLE says. However we can't do all of them on A32 without poly128_t.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499