[PATCH] D85709: [InstSimplify] Implement Instruction simplification for X/sqrt(X) to sqrt(X).

Thu Aug 13 08:38:05 PDT 2020

spatel added a comment.

In D85709#2215786 <https://reviews.llvm.org/D85709#2215786>, @venkataramanan.kumar.llvm wrote:

> In D85709#2215718 <https://reviews.llvm.org/D85709#2215718>, @spatel wrote:
>
>> In D85709#2214967 <https://reviews.llvm.org/D85709#2214967>, @venkataramanan.kumar.llvm wrote:
>>
>>> In D85709#2211204 <https://reviews.llvm.org/D85709#2211204>, @spatel wrote:
>>>
>>>> After looking at the codegen, I'm not sure if we can do this transform in IR with the expected performance in codegen because the transform loses information:
>>>> https://godbolt.org/z/7b84rG
>>>>
>>>> The codegen for the case of "sqrt(x)" has to account for a 0.0 input. Ie, we filter out a 0.0 (or potentially denorm) input to avoid the NAN answer that we would get from "0.0 / 0.0". But the codegen for the case of "x/sqrt(x)" does not have to do that - NAN is the correct answer for a 0.0 input, so the code has implicitly signaled to us that 0.0 is not a valid input when compiled with -ffast-math (we can ignore possible NANs).
>>>>
>>>> It might help to see the motivating code that produces the x/sqrt(x) pattern to see if there's something else we should be doing there.
>>>
>>> Current AMD "x86_64" targets don't have the reciprocal sqrt instruction for the double precision types. 
>>> so x/sqrt(x) ends up with "vsqrtsd" followed by "vdivsd". This transform is basically to improve the efficiency.
>>
>> Ah, I see. I think we should handle that in generic DAGCombiner then. There, we can make the target- and CPU-specific trade-offs necessary to get the (presumably) ideal asm code. I don't know how we would recover the missing div-by-0 info that I mentioned here.
>> Let me know if you want to try that patch. If not, I can take a shot at it.
>
> Sure I will work on the DAGCombiner patch .

Added some test coverage here:
rGdd1a900575ff <https://reviews.llvm.org/rGdd1a900575ff727d773132111b5f4091cd53d8e6>

Feel free to adjust as needed. If I'm seeing correctly, it should be a similar small code patch as this one, just adapted to SDAG nodes/flags.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85709/new/

https://reviews.llvm.org/D85709