[llvm] AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (PR #74197)

Tue Dec 12 20:37:01 PST 2023

arsenm wrote:

> I have checked our current llvm.sqrt.f32 lowering. To get the same result as native_sqrt would give either a call needs to have afn attribute, or fpmath metadata has to be attached to the call requesting 2ulp or lower accuracy.

Also depends on the denormal mode. All of the combinations of llvm.sqrt.f32 lowering cases are handled already

> 
> On the other hand current folding is done if either 'fast' flag is set on the call or "unsafe-fp-math" attribute is set on a caller function. 

The current folding is useless. All it does is replace sqrt with native_sqrt calls, if you use an off by default amdgpu-use-native flag. It's basically dead code, nothing uses the flag.

> So the question is: will conditions from the first list be satisfied if any one of the conditions from the second list is met?
>  I.e. does it have a potential for regression?

No. AMDGPULibCalls was never as refined as it should be. unsafe-fp-math is deprecated-ish and doesn't matter, it's functionally an alias for an uncertain set of the other attributes and 

> 
> For instance I do not see checks for 'call fast float @llvm.sqrt.f32' in the fsqrt.f32.ll.

fast is just a union of all the flags, it's excess. There's no real point in testing the effect of excess flags, it's just a much less refined variant of the tests that already are there. 

https://github.com/llvm/llvm-project/pull/74197