[PATCH] D85709: [InstSimplify] Implement Instruction simplification for X/sqrt(X) to sqrt(X).

Thu Aug 13 07:17:52 PDT 2020

spatel added a comment.

In D85709#2214967 <https://reviews.llvm.org/D85709#2214967>, @venkataramanan.kumar.llvm wrote:

> In D85709#2211204 <https://reviews.llvm.org/D85709#2211204>, @spatel wrote:
>
>> After looking at the codegen, I'm not sure if we can do this transform in IR with the expected performance in codegen because the transform loses information:
>> https://godbolt.org/z/7b84rG
>>
>> The codegen for the case of "sqrt(x)" has to account for a 0.0 input. Ie, we filter out a 0.0 (or potentially denorm) input to avoid the NAN answer that we would get from "0.0 / 0.0". But the codegen for the case of "x/sqrt(x)" does not have to do that - NAN is the correct answer for a 0.0 input, so the code has implicitly signaled to us that 0.0 is not a valid input when compiled with -ffast-math (we can ignore possible NANs).
>>
>> It might help to see the motivating code that produces the x/sqrt(x) pattern to see if there's something else we should be doing there.
>
> Current AMD "x86_64" targets don't have the reciprocal sqrt instruction for the double precision types. 
> so x/sqrt(x) ends up with "vsqrtsd" followed by "vdivsd". This transform is basically to improve the efficiency.

Ah, I see. I think we should handle that in generic DAGCombiner then. There, we can make the target- and CPU-specific trade-offs necessary to get the (presumably) ideal asm code. I don't know how we would recover the missing div-by-0 info that I mentioned here.
Let me know if you want to try that patch. If not, I can take a shot at it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85709/new/

https://reviews.llvm.org/D85709