[llvm] [SLPVectorizer] Support SLPVectorizer cases of tan across all backends (PR #95517)

Wed Jun 26 14:29:55 PDT 2024

farzonl wrote:

> Need to be sure that the cost of the vector version is high enough for the targets that do not support it. Otherwise, they may suffer from the perf drop. And still need to add the test, especially codegen (if still not) to be sure that the targets can lower the vector versions correctly.

I'm not familar with a way to check for perf impact across all backends. Could you give some guidance on how I could figure that out to answer this question?

I'm not familar enough with most these backends to know which ones needs tests. Trig functions seem to only be tested on RISCV, Aarch64, and x86. I'll do a deep dive below on the state of things and then maybe you can give me answer as to which backends you would like to see tests for.

As for which backends support vectorization, I'm going to assume we can limit backends to what exists in `llvm/test/Transforms/SLPVectorizer`
That would be 
`AArch64/  AMDGPU/  ARM/  NVPTX/  PowerPC/  RISCV/  SystemZ/  VE/  WebAssembly/  X86/  XCore/`

I think a partial list can be figured out from here: 
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/TargetLibraryInfo.h#L124-L134

- Accelerate framework (Apple?, x86?, aarch64?)
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L51
- DARWIN_LIBSYSTEM  (MacOS? x86, aarch64)
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L97
- Libm X86\X86_64
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L154-L155
- IBM  MASSV PowerPC
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L264
- SVML (x86\X86_64)
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L328-L330
- SLEEF
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L689
- SLEEF Scalable
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L837
- All SLEEF is aarch64 only:
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/lib/Analysis/TargetLibraryInfo.cpp#L1295-L1303
- ARMPL 
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L1095
- AMD Libm (x86\x86_64)
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/include/llvm/Analysis/VecFuncs.def#L1279-L1281

So that leads me to believe there is vectoization support on `x86`,` x86_64`, `arm`, `aarch64`, and `PowerPC`. 

There also might be some RISCV support based on what I found for `sinf`
https://github.com/llvm/llvm-project/blob/62d5393c6fed4029996e14ba4ee30eceb143a017/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll#L356

So if we subtract what we know supports vectorization from the full list of tests we are left with `AMDGPU,`  `NVPTX`,  `SystemZ`, `VE`, ` WebAssembly`,   and  XCore`.

- PTX has a f16x2:
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-tan

 - AMDGPU  I can't find any explict support even for scalar tan however HIP\RocM support on device tan operations so maybe i'm not looking in the right places
 https://rocm.docs.amd.com/projects/HIP/en/latest/reference/kernel_language.html
https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___math_float.html#ga0a27f2dd7ba6f1aa7c088f6e66b5e6b3

- SystemZ does not have vectorization for any trig operations 
which causes this bug in vectorized cosine: https://github.com/rust-lang/packed_simd/issues/14
It does have vectorization support for other ISD operations on a sub target basis:
https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp#L559

- Xcore can't find an ISA, but there does appear to be scalar tan support: https://github.com/xmos/lib_xcore_math/blob/ca3161ff6f65f240bb3d022673e4b04b82ec63b7/doc/programming_guide/src/reference/scalar/csv/scalar_fixed_point_ops.csv#L5

- WebAssembly Looks to handle by scalarizing:
https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/test/CodeGen/WebAssembly/simd-unsupported.ll#L380-L386

- VE I can't find a published document on the vector engine isa.
https://github.com/llvm/llvm-project/blob/a54704de0d019760c80517b97bd1df636076a059/llvm/lib/Target/VE/VEISelLowering.cpp#L247-L255, but it is only handling the scalar cases for sin\cos.

So one thing of note i discovered is that sin\cos are SLPvectorized despite the fack that some of these backends do not support it. That makes me wonder if  sin\cos vectorization should be removed or if it is ok that tan can be vectorized even if it lacks support across all backends.

https://github.com/llvm/llvm-project/pull/95517