[PATCH] D114765: [X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type

Tue Nov 30 07:47:30 PST 2021

spatel added a comment.

In D114765#3160396 <https://reviews.llvm.org/D114765#3160396>, @craig.topper wrote:

> In D114765#3160387 <https://reviews.llvm.org/D114765#3160387>, @pengfei wrote:
>
>> In D114765#3160256 <https://reviews.llvm.org/D114765#3160256>, @craig.topper wrote:
>>
>>> In D114765#3160245 <https://reviews.llvm.org/D114765#3160245>, @pengfei wrote:
>>>
>>>> In D114765#3160232 <https://reviews.llvm.org/D114765#3160232>, @craig.topper wrote:
>>>>
>>>>> What's the correctness issue? It's fast math so the answer isn't required to be exact.
>>>>
>>>> See https://godbolt.org/z/P4Y89hsj4
>>>> I guess we need at least `RefinementSteps = 1` for correntness when `Reciprocal == 0`.
>>>
>>> That seems like a more general bug. The same issue happens if you force the estimate steps for floating point https://godbolt.org/z/vK8eab8zP
>>
>> I see. But I'm not sure if it is bug or by design. Maybe need an assertion?
>
> I think this is a weird interface quirk. The caller interprets returning RefinementSteps = 0 to mean that all needed code has been created and nothing should be done. Theoretically a target could have its own way of handling it without the final FMUL the target independent code inserts. So I think X86 should either insert the final FMUL itself or not do the reciprocal approximation for non-reciprocal if RefinementSteps is 0.  NVPTXTargetLowering::getSqrtEstimate does the latter.

That looks like a long-standing bug for x86. I don't remember if that was a design flaw originally or if the code evolved to handle more patterns, and we just missed that problem.
Better to add the tests first, add the code to fix the bug, then add the code to bypass for f16?

================
Comment at: llvm/test/CodeGen/X86/avx512fp16vl-intrinsics.ll:972

+define <8 x half> @test_sqrt_ph_128_fast2(<8 x half> %a0, <8 x half> %a1) {
+; CHECK-LABEL: test_sqrt_ph_128_fast2:
----------------
Does it make sense to add a scalar test too?

  define half @sqrt_half_fast(half %a0, half %a1) {
    %1 = call fast half @llvm.sqrt.f16(half %a0)
    ret half %1
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114765/new/

https://reviews.llvm.org/D114765