[llvm] AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (PR #74197)

Tue Dec 12 14:17:01 PST 2023

rampitec wrote:

I have checked our current llvm.sqrt.f32 lowering. To get the same result as native_sqrt would give either a call needs to have afn attribute, or fpmath metadata has to be attached to the call requesting 2ulp or lower accuracy.

On the other hand current folding is done if either 'fast' flag is set on the call or "unsafe-fp-math" attribute is set on a caller function. So the question is: will conditions from the first list be satisfied if any one of the conditions from the second list is met? I.e. does it have a potential for regression?

For instance I do not see checks for 'call fast float @llvm.sqrt.f32' in the fsqrt.f32.ll.

https://github.com/llvm/llvm-project/pull/74197