[PATCH] D155987: AMDGPU: Move placement of RemoveIncompatibleFunctions

Fri Aug 18 02:30:38 PDT 2023

jchlanda added a comment.

@arsenm would you be so kind and explain the reasoning behind reshuffling the order of the passes.

In oneAPI math functions are handled through libclc <https://github.com/intel/llvm/tree/sycl/libclc/amdgcn-amdhsa/libspirv> which implements spir-v interface.

If, as an example, we focus on a call to `tanh` taking `float4`, oneAPI will generate a call to `__spirv_ocl_tanh(float vector[4])`, which under the hood for AMD backend is implemented as a sequence of scalar calls to `__ocml_tanh_f32` (provided by AMD's device lib implementation <https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/doc/OCML.md>).

Because `AMDGPURemoveIncompatibleFunctions` is now run early on, and as `__spirv_ocl_tanh(float vector[4])` requires (among others) target support of `FeatureDot3Inst` (which is not provided on the GPU in question gfx1030), it is being earmarked for deletion and we end up with modules containing calls to deleted funcs:

  %call3.i.i.i = tail call fastcc noundef <4 x float> null(<4 x float> noundef %agg.tmp.sroa.0.0.copyload.i)

As all the libclc functions are marked with always inline, it used to work well, as we would always replace the problematic vector function calls with exploded scalar versions of `__ocml_...`, this was handled for us by always inliner pass <https://github.com/intel/llvm/blob/sycl/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L996>.

Would it be possible to have the incompatible functions pass run after the inliners?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155987/new/

https://reviews.llvm.org/D155987