[PATCH] D85709: [InstSimplify] Implement Instruction simplification for X/sqrt(X) to sqrt(X).

Tue Aug 11 07:46:59 PDT 2020

venkataramanan.kumar.llvm added a comment.

In D85709#2210042 <https://reviews.llvm.org/D85709#2210042>, @cameron.mcinally wrote:

> In D85709#2210034 <https://reviews.llvm.org/D85709#2210034>, @lebedev.ri wrote:
>
>> In D85709#2209979 <https://reviews.llvm.org/D85709#2209979>, @cameron.mcinally wrote:
>>
>>> I'm fairly sure this transform is a performance loss. For a target like Skylake Server, a SQRT(x) can take up to 20 cycles. But a RSQRT(x) is about 6 cycles and a MUL(y) is 4 cycles. We'd be better off with a X*RSQRT(X).
>>
>> That is up to backends to decide. InstSimplify/InstCombine (and a few others) are canonicalization, target-independent passes.
>> A single `sqrt(x)` is more canonical IR than `x/sqrt(x)`, because it's less instructions and `x` has less uses.
>
> I agree with that. It should be canonicalized. It would also be good to make sure that the backends have lowering code in place before introducing a 2x performance hit.

As of now for -march=skylake -Ofast I get.

https://godbolt.org/z/jqWzPq

--Snip--
foo: # @foo

  vsqrtsd xmm1, xmm0, xmm0
  vdivsd xmm0, xmm0, xmm1
  ret

---Snip--

Backend can lower  the SQRT(X) back to X * RSQRT(X)  in a separate patch?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85709/new/

https://reviews.llvm.org/D85709