[llvm] [SLPVectorizer] Support SLPVectorizer cases of tan across all backends (PR #95517)

Thu Jun 27 05:44:11 PDT 2024

alexey-bataev wrote:

> > Need to be sure that the cost of the vector version is high enough for the targets that do not support it. Otherwise, they may suffer from the perf drop. And still need to add the test, especially codegen (if still not) to be sure that the targets can lower the vector versions correctly.
> 
> I'm not familar with a way to check for perf impact across all backends. Could you give some guidance on how I could figure that out to answer this question?
> 
> I'm not familar enough with most these backends to know which ones needs tests. Trig functions seem to only be tested on `RISCV`, `Aarch64`, and `x86`. I'll do a deep dive below on the state of things and then maybe you can give me some expectations on which backends you would like to see tests for.
> 
> As for which backends support vectorization, I'm going to assume we can limit backends to what exists in `llvm/test/Transforms/SLPVectorizer` That would be `AArch64/ AMDGPU/ ARM/ NVPTX/ PowerPC/ RISCV/ SystemZ/ VE/ WebAssembly/ X86/ XCore/`
> 
> I think a partial list can be figured out from here:
> 
> https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/TargetLibraryInfo.h#L124-L134
> 
> * Accelerate framework (Apple?, x86?, aarch64?)
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L51
> * DARWIN_LIBSYSTEM  (MacOS? x86, aarch64)
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L97
> * Libm X86\X86_64
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L154-L155
> * IBM  MASSV PowerPC
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L264
> * SVML (x86\X86_64)
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L328-L330
> * SLEEF
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L689
> * SLEEF Scalable
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L837
> * All SLEEF is aarch64 only:
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/lib/Analysis/TargetLibraryInfo.cpp#L1295-L1303
> * ARMPL
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L1095
> * AMD Libm (x86\x86_64)
>   https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L1279-L1281
> 
> So that leads me to believe there is vectoization support on `x86`,` x86_64`, `arm`, `aarch64`, and `PowerPC`.
> 
> There also might be some RISCV support based on what I found for `sinf`
> 
> https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll#L356
> 
> So if we subtract what we know supports vectorization from the full list of tests we are left with `AMDGPU,` `NVPTX`, `SystemZ`, `VE`, ` WebAssembly`, and `XCore`.
> 
> * PTX has a f16x2:
>   https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-tan
> * AMDGPU  I can't find any explict support even for scalar tan however HIP\RocM support on device tan operations so maybe i'm not looking in the right places
>   https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html
>   https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___math_float.html#ga0a27f2dd7ba6f1aa7c088f6e66b5e6b3
> * SystemZ does not have vectorization for any trig operations
>   which causes this bug in vectorized cosine: [LLVM floating-point math intrinsics fail on s390x-unknown-linux-gnu rust-lang/packed_simd#14](https://github.com/rust-lang/packed_simd/issues/14)
>   It does have vectorization support for other ISD operations on a sub target basis:
>   https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp#L559
> * Xcore can't find an ISA, but there does appear to be scalar tan support: https://github.com/xmos/lib_xcore_math/blob/ca3161ff6f65f240bb3d022673e4b04b82ec63b7/doc/programming_guide/src/reference/scalar/csv/scalar_fixed_point_ops.csv#L5
> * WebAssembly Looks to handle by scalarizing:
>   https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/test/CodeGen/WebAssembly/simd-unsupported.ll#L380-L386
> * VE I can't find a published document on the vector engine isa.
>   https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/lib/Target/VE/VEISelLowering.cpp#L247-L255
>   , but it is only handling the scalar cases for sin\cos.
> 
> So one thing of note i discovered is that sin\cos are SLPvectorized despite the fact that some of these backends do not support it. That makes me wonder if sin\cos vectorization should be removed or if it is ok that tan can be vectorized even if it lacks support across all backends.

I had the patch for this issue some time ago (https://reviews.llvm.org/D154738, see @RKSimon response), we still need supporting this, such nodes should not be vectorized. Ok, let's keep it as is for now

https://github.com/llvm/llvm-project/pull/95517