[llvm] 626c373 - [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)

Wed Sep 2 05:25:42 PDT 2020

Author: Venkataramanan Kumar
Date: 2020-09-02T08:23:48-04:00
New Revision: 626c3738cdfa49527097fccdf89e22949138ade7

URL: https://github.com/llvm/llvm-project/commit/626c3738cdfa49527097fccdf89e22949138ade7
DIFF: https://github.com/llvm/llvm-project/commit/626c3738cdfa49527097fccdf89e22949138ade7.diff

LOG: [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)

These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726

Added: 
    

Modified: 
    llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
    llvm/test/Transforms/InstCombine/fmul-sqrt.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 26db91cc5112..99f19d9663b7 100644

--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -544,6 +544,21 @@ Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
       return replaceInstUsesWith(I, Sqrt);
     }
 
+    // The following transforms are done irrespective of the number of uses
+    // for the expression "1.0/sqrt(X)".
+    //  1) 1.0/sqrt(X) * X -> X/sqrt(X)
+    //  2) X * 1.0/sqrt(X) -> X/sqrt(X)
+    // We always expect the backend to reduce X/sqrt(X) to sqrt(X), if it
+    // has the necessary (reassoc) fast-math-flags.
+    if (I.hasNoSignedZeros() &&
+        match(Op0, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
+        match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op1 == X)
+      return BinaryOperator::CreateFDivFMF(X, Y, &I);
+    if (I.hasNoSignedZeros() &&
+        match(Op1, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
+        match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op0 == X)
+      return BinaryOperator::CreateFDivFMF(X, Y, &I);
+
     // Like the similar transform in instsimplify, this requires 'nsz' because
     // sqrt(-0.0) = -0.0, and -0.0 * -0.0 does not simplify to -0.0.
     if (I.hasNoNaNs() && I.hasNoSignedZeros() && Op0 == Op1 &&

diff  --git a/llvm/test/Transforms/InstCombine/fmul-sqrt.ll b/llvm/test/Transforms/InstCombine/fmul-sqrt.ll
index de030bb59c56..e77a828729e1 100644
--- a/llvm/test/Transforms/InstCombine/fmul-sqrt.ll
+++ b/llvm/test/Transforms/InstCombine/fmul-sqrt.ll
@@ -103,7 +103,7 @@ define double @rsqrt_x_reassociate_extra_use(double %x, double * %p) {
 ; CHECK-LABEL: @rsqrt_x_reassociate_extra_use(
 ; CHECK-NEXT:    [[SQRT:%.*]] = call double @llvm.sqrt.f64(double [[X:%.*]])
 ; CHECK-NEXT:    [[RSQRT:%.*]] = fdiv double 1.000000e+00, [[SQRT]]
-; CHECK-NEXT:    [[RES:%.*]] = fmul reassoc nsz double [[RSQRT]], [[X]]
+; CHECK-NEXT:    [[RES:%.*]] = fdiv reassoc nsz double [[X:%.*]], [[SQRT]]
 ; CHECK-NEXT:    store double [[RSQRT]], double* [[P:%.*]], align 8
 ; CHECK-NEXT:    ret double [[RES]]
 ;
@@ -119,7 +119,7 @@ define <2 x float> @x_add_y_rsqrt_reassociate_extra_use(<2 x float> %x, <2 x flo
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd fast <2 x float> [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    [[SQRT:%.*]] = call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> [[ADD]])
 ; CHECK-NEXT:    [[RSQRT:%.*]] = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[SQRT]]
-; CHECK-NEXT:    [[RES:%.*]] = fmul fast <2 x float> [[ADD]], [[RSQRT]]
+; CHECK-NEXT:    [[RES:%.*]] = fdiv fast <2 x float> [[ADD]], [[SQRT]]
 ; CHECK-NEXT:    store <2 x float> [[RSQRT]], <2 x float>* [[P:%.*]], align 8
 ; CHECK-NEXT:    ret <2 x float> [[RES]]
 ;