[PATCH] D154508: [TLI][AArch64] Add mappings to vectorized functions from ArmPL

Fri Jul 7 06:57:31 PDT 2023

paulwalker-arm added inline comments.

================
Comment at: llvm/include/llvm/Analysis/VecFuncs.def:907-910
+TLI_DEFINE_VECFUNC("sinpi", "armpl_vsinpiq_f64", FIXED(2), NOMASK)
+TLI_DEFINE_VECFUNC("sinpif", "armpl_vsinpiq_f32", FIXED(4), NOMASK)
+TLI_DEFINE_VECFUNC("sinpi", "armpl_svsinpi_f64_x",  SCALABLE(2), MASKED)
+TLI_DEFINE_VECFUNC("sinpif", "armpl_svsinpi_f32_x", SCALABLE(4), MASKED)
----------------
I guess this is ok but please be aware that `sinpi` is very new and even my reasonably up to date Fedora install doesn't have it.

================
Comment at: llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll:2-3
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S -vector-library=ArmPL -replace-with-veclib < %s | FileCheck %s  --check-prefixes=NEON
+; RUN: opt -S -mattr=+sve -vector-library=ArmPL -replace-with-veclib < %s | FileCheck %s  --check-prefixes=SVE
+
----------------
It doesn't make sense to split the testing like this because the scalable vector tests are essentially bogus  when SVE is not available and the fixed length vector tests should produce the same result when SVE is enabled.

I think there should either be a single RUN line, or two test files that separate the fixed and scalable tests.  Personally I'd opt for the former given there's nothing in the TLI interface that uses target features when making decisions

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/armpl-calls-with-ptr.ll:11-12
+
+declare double @sincos(double, ptr, ptr)
+declare float @sincosf(float, ptr, ptr)
+
----------------
The `sincos` functions do not return a value.  It looks like you're calling them correctly so just the declarations are wrong.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/armpl-calls-with-ptr.ll:16-17
+; CHECK-LABEL: @linear_sincos_f64(
+; NEON-NOT: [[TMP5:%.*]] = call <2 x double> @armpl_vsincosq_f64
+; SVE-NOT: [[TMP5:%.*]] = call <vscale x 2 x double> @armpl_svsincos_f64_x
+; CHECK: ret void
----------------
This is not the signature of ArmPL's `sincos` functions.  They all return `void` and thus these `NOT` lines will not match even when vectorisation is possible.  In general it's best to keep `NOT` lines as minimal as possible so perhaps just `NOT: @armpl_` or even `NOT: vector.body`?

With that said, the `ilogb` and `ldexp` issues I raise below show the dangerous side of including mappings the compiler doesn't yet know how to handle. For this reason I would much rather remove all the unsupported ones and only introduce them once the compiler is ready to use them.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/armpl-calls.ll:1852-1853
+; CHECK-LABEL: @ilogb_f64(
+; NEON: [[TMP5:%.*]] = call <2 x i32> @armpl_vilogbq_f64(<2 x double> [[TMP4:%.*]])
+; SVE: [[TMP5:%.*]] = call <vscale x 2 x i32> @armpl_svilogb_f64_x(<vscale x 2 x double> [[TMP4:%.*]], <vscale x 2 x i1> {{.*}})
+; CHECK: ret void
----------------
This is not the interface ArmPL exposes for the vector variants of `ilogb`.  I think this is likely a bug in ArmPL whereby the function's return type incorrectly mirrors the operand type. It's possible this doesn't matter for SVE but for NEON the code generation for the f64 case will certainly be wrong.  I think it's safest to remove the mappings until we're sure ArmPL is ready.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/armpl-calls.ll:1903-1905
+; NEON: [[TMP5:%.*]] = call <2 x double> @armpl_vldexpq_f64(<2 x double> [[TMP4:%.*]], <2 x i32> [[TMP4:%.*]])
+; SVE: [[TMP5:%.*]] = call <vscale x 2 x double> @armpl_svldexp_f64_x(<vscale x 2 x double> [[TMP4:%.*]], <vscale x 2 x i32> [[TMP4:%.*]], <vscale x 2 x i1> {{.*}})
+; CHECK: ret void
----------------
This is not the interface ArmPL implements for vector `ldexp` functions.  ArmPL requires the integer operand to be a scalar. I think the mappings should be removed until either ArmPL matches the interface currently expected or LoopVectorize gains the smarts to support scalar operands.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154508/new/

https://reviews.llvm.org/D154508