[PATCH] D155987: AMDGPU: Move placement of RemoveIncompatibleFunctions

Fri Aug 18 07:02:31 PDT 2023

arsenm added a comment.

In D155987#4598269 <https://reviews.llvm.org/D155987#4598269>, @jchlanda wrote:

> @arsenm would you be so kind and explain the reasoning behind reshuffling the order of the passes.

If the function is unsupportable and going to be deleted, there's no point in running any of the other passes on the function. I was trying to keep all the module passes together

> In oneAPI math functions are handled through libclc <https://github.com/intel/llvm/tree/sycl/libclc/amdgcn-amdhsa/libspirv> which implements spir-v interface.
>
> If, as an example, we focus on a call to `tanh` taking `float4`, oneAPI will generate a call to `__spirv_ocl_tanh(float vector[4])`, which under the hood for AMD backend is implemented as a sequence of scalar calls to `__ocml_tanh_f32` (provided by AMD's device lib implementation <https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/doc/OCML.md>).
>
> Because `AMDGPURemoveIncompatibleFunctions` is now run early on, and as `__spirv_ocl_tanh(float vector[4])` requires (among others) target support of `FeatureDot3Inst` (which is not provided on the GPU in question gfx1030), it is being earmarked for deletion and we end up with modules containing calls to deleted funcs:

But __ocml_tanh_f32 doesn't use a dot intrinsic? Not sure how or where you would be seeing that. ocml is currently free of subtarget feature dependence

>   %call3.i.i.i = tail call fastcc noundef <4 x float> null(<4 x float> noundef %agg.tmp.sroa.0.0.copyload.i)
>
> As all the libclc functions are marked with always inline, it used to work well, as we would always replace the problematic vector function calls with exploded scalar versions of `__ocml_...`, this was handled for us by always inliner pass <https://github.com/intel/llvm/blob/sycl/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L996>.

libclc should stop doing using always_inline (and we should stop running the always inliner in the backend (I'm moving towards deleting it in D152414 <https://reviews.llvm.org/D152414>). Function calls should work well now and we should act like a normal target, we don't need these old hacks to support old gaps in codegen support.

> Would it be possible to have the incompatible functions pass run after the inliners?

The inliner should not be able to fix your code. If this is deleting the function, then it shouldn't have been an inlining candidate in the first place. Something is failing to consider the incompatible feature.

This pass is truly a disgusting and unmaintainable hack I want to eventually remove. The way every other target expects this to be handled is to not allow unsupportable code to taint the module in the first place. This situation would never work in a world where machine linking happens, and you're relying on a compiler implementation detail to produce a working program. A better implementation would be to conditionally load specialized versions of functions when the target is known, not pretend we have one generic implementation that the compiler fixes up. I've been working towards this recently, and have purged all of the subtarget dependence from ocml so I'm not sure why you're running into this. There are still many problematic subtarget dependencies in ockl functions though

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155987/new/

https://reviews.llvm.org/D155987