[llvm] AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (PR #74197)

Stanislav Mekhanoshin via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 12 14:55:14 PST 2023


rampitec wrote:

With fast on a call site it gives expected results, it would be nice to add this to the fsqrt.f32.ll:
```
define float @v_sqrt_f32_fast(float %x) {
  %result = call fast float @llvm.sqrt.f32(float %x)
  ret float %result
}

declare float @llvm.sqrt.f32(float)
```
Test with unsafe-fp-math attribute present there (v_sqrt_f32__unsafe_attr), but needs a fix. This gives expected results though:
```
define float @v_sqrt_f32__unsafe_attr(float %x) #4 {
  %result = call float @llvm.sqrt.f32(float %x)
  ret float %result
}

declare float @llvm.sqrt.f32(float)

attributes #4 = { "unsafe-fp-math"="true" }
```
The difference is that in the fsqrt.f32.ll it has nsz attribute:
```
define float @v_sqrt_f32__unsafe_attr(float %x) #4 {
; GCN-LABEL: v_sqrt_f32__unsafe_attr:
; GCN:       ; %bb.0:
; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT:    v_sqrt_f32_e32 v0, v0
; GCN-NEXT:    s_setpc_b64 s[30:31]
  %result = call nsz float @llvm.sqrt.f32(float %x)
  ret float %result
}
```

So I suggest adding the first test (with fast) and removing nsz in the second. The rest seems OK.

https://github.com/llvm/llvm-project/pull/74197


More information about the llvm-commits mailing list