[llvm] [AMDGPU] Use native sqrt when flushing denorm is allowed (PR #114173)

Wed Oct 30 07:02:36 PDT 2024

ruiling wrote:

> Pretty sure this is wrong and won't pass OpenCL conformance with FTZ on. The sqrt instruction is still 1ulp, not 0.5 ulp. We should already account for denormal mode and !fpmath metadata in AMDGPUCodeGenPrepare

Thanks for the feedback @arsenm, I did not take close look the real issue before. I have a look the code in AMDGPUCodeGenPrepare, we did not set the fast math flag properly for the case. But it sounds surprising OpenCL conformance needs correctly rounded sqrt. I think it requires 3ulp for sqrt?

https://github.com/llvm/llvm-project/pull/114173