[PATCH] D86726: [InstCombine]: Transform 1.0/sqrt(X) * X to X/sqrt(X) and X * 1.0/sqrt(X) to X/sqrt(X)

Mon Aug 31 08:41:05 PDT 2020

spatel added a comment.

Note that we may still have backend gaps in sqrt codegen, so we will need to watch out for regressions. I tried to squash those better with:
rG716e35a0cf53 <https://reviews.llvm.org/rG716e35a0cf53e85a5fddc7ff86b79a751b1b2040>
rG1c9a09f42e5e <https://reviews.llvm.org/rG1c9a09f42e5ed66cba04700f9272ff53ea3cca86>

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp:553-558
+    if (match(Op0, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
+        match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op1 == X)
+      return BinaryOperator::CreateFDivFMF(X, Y, &I);
+    if (match(Op1, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
+        match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op0 == X)
+      return BinaryOperator::CreateFDivFMF(X, Y, &I);
----------------
We should predicate this fold using the same FMF that the backend is using. Currently at least, we are requiring "nsz", so that should be checked here.

================
Comment at: llvm/test/Transforms/InstCombine/fmul-sqrt.ll:110-112
   %sqrt = call fast double @llvm.sqrt.f64(double %x)
   %rsqrt = fdiv fast double 1.0, %sqrt
   %res = fmul reassoc double %rsqrt, %x
----------------
This test could provide better coverage for the logic as implemented. The only FMF requirement is the "reassoc" (and likely "nsz") on the fmul, and that's typical of our FP folds so far. 

If we want to make that more conservative by requiring FMF on the fdiv or fsqrt too, this test would then offer a potential counter-example.

There's some discussion about the semantics/subtlety of FMF in:
https://llvm.org/PR46326

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86726/new/

https://reviews.llvm.org/D86726