[llvm] r183802 - [mips] Add an IR transformation pass that optimizes calls to sqrt.

Wed Jun 12 17:49:33 PDT 2013

When I run my test program with a negative input, the latter approach is
about 10% faster.
The former approach is faster when the input is positive, but the
difference is smaller (3-4%).

I created a patch which moves this optimization to lib/CodeGen. You can try
optimizing sqrt for other targets with these commands:

$  llc test-sqrt2.ll -o - -enable-math-optimization (this is for checking
nan)
$  llc test-sqrt2.ll -o - -enable-math-optimization -check-neg-fp (this is
for checking negative)

Note that in order to emit a native sqrt instruction, the target needs a
fsqrt pattern.
I tried this pass on x86, but didn't see a large difference between the two
approaches.

On Tue, Jun 11, 2013 at 10:27 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> >
> > I ran a simple experiment to compare the two approaches. I don't have
> > the exact numbers yet, but the approach in this patch is slightly
> > faster than the approach that checks whether the input is negative,
> > when the input is always non-negative. The difference is not
> > significant, but is still measurable. One thing I noticed studying
> > the generated code is that mips needs extra instructions (either a
> > load or copy from register $0) and an extra register to load
> > constant 0.0, which might explain the difference.
> >
> >
> > When the input is negative, the latter approach was faster as
> > expected.
>
> Just a little faster, or ~2x faster? I think that this makes a big
> difference. As a user, I've been bitten by this kind of transformation
> before, and find it quite annoying when applications exhibit huge slowdowns
> when NaNs start being generated.
>
> Thanks again,
> Hal
>
> >
> > I'll continue looking into this.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jun 11, 2013 at 4:13 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> >
> > ----- Original Message -----
> > >
> > >
> > > I was simply copying what gcc was doing. As you've pointed out, you
> > > can check if the input is negative first. The difference is that
> > > you'll have to emit one more branch (not sure how much difference
> > > this will make to performance):
> >
> > I think this is worth testing. I'd guess that, assuming you arrange
> > the branches so that it defaults to calling the native instruction,
> > the cost of the extra branch will be small, and you won't have the
> > >2x runtime increase on negative inputs.
> >
> > -Hal
> >
> >
> >
> > >
> > >
> > > if (input is negative)
> > >
> > > libcall
> > > else
> > > native instruction
> > >
> > >
> > >
> > >
> > > (transformation in this patch):
> > >
> > >
> > > native_instruction
> > >
> > >
> > >
> > >
> > > if (result is nan)
> > > libcall
> > >
> > >
> > > I made this pass mips-specific because I was planning to commit
> > > another patch which does a mips-specific optimization to calls to
> > > other math library calls (ceil, trunc and trunc), but there isn't
> > > any reason the transformation in this patch can't be target
> > > independent.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Jun 11, 2013 at 3:45 PM, Hal Finkel < hfinkel at anl.gov >
> > > wrote:
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > Author: ahatanak
> > > > Date: Tue Jun 11 17:21:44 2013
> > > > New Revision: 183802
> > > >
> > > > URL: http://llvm.org/viewvc/llvm-project?rev=183802&view=rev
> > > > Log:
> > > > [mips] Add an IR transformation pass that optimizes calls to
> > > > sqrt.
> > > >
> > > > The pass emits a call to sqrt that has attribute "read-none".
> > > > This
> > > > call will be
> > > > converted to an ISD::FSQRT node during DAG construction, which
> > > > will
> > > > turn into
> > > > a mips native sqrt instruction.
> > >
> > > This seems almost completely target independent, is there a reason
> > > to
> > > make this MIPS-specific? Also, is it really better to test the
> > > result for NaN as opposed to checking whether the input is
> > > negative?
> > >
> > > -Hal
> > >
> > >
> > >
> > > >
> > > >
> > > > Added:
> > > > llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
> > > > llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll
> > > > Modified:
> > > > llvm/trunk/lib/Target/Mips/Mips.h
> > > > llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp
> > > >
> > > > Modified: llvm/trunk/lib/Target/Mips/Mips.h
> > > > URL:
> > > >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/Mips.h?rev=183802&r1=183801&r2=183802&view=diff
> > > >
> ==============================================================================
> > > > --- llvm/trunk/lib/Target/Mips/Mips.h (original)
> > > > +++ llvm/trunk/lib/Target/Mips/Mips.h Tue Jun 11 17:21:44 2013
> > > > @@ -28,7 +28,7 @@ namespace llvm {
> > > > FunctionPass *createMipsJITCodeEmitterPass(MipsTargetMachine &TM,
> > > > JITCodeEmitter &JCE);
> > > > FunctionPass *createMipsConstantIslandPass(MipsTargetMachine
> > > > &tm);
> > > > -
> > > > + FunctionPass *createMipsOptimizeMathLibCalls(MipsTargetMachine
> > > > &TM);
> > > > } // end namespace llvm;
> > > >
> > > > #endif
> > > >
> > > > Added: llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
> > > > URL:
> > > >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp?rev=183802&view=auto
> > > >
> ==============================================================================
> > > > --- llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
> > > > (added)
> > > > +++ llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp Tue
> > > > Jun
> > > > 11 17:21:44 2013
> > > > @@ -0,0 +1,175 @@
> > > > +//===---- MipsOptimizeMathLibCalls.cpp - Optimize math lib
> > > > calls.
> > > > ----===//
> > > > +//
> > > > +// The LLVM Compiler Infrastructure
> > > > +//
> > > > +// This file is distributed under the University of Illinois
> > > > Open
> > > > Source
> > > > +// License. See LICENSE.TXT for details.
> > > > +//
> > > >
> +//===----------------------------------------------------------------------===//
> > > > +//
> > > > +// This pass does an IR transformation which enables the backend
> > > > to
> > > > emit native
> > > > +// math instructions.
> > > > +//
> > > >
> +//===----------------------------------------------------------------------===//
> > > > +
> > > > +#include "MipsTargetMachine.h"
> > > > +#include "llvm/IR/IRBuilder.h"
> > > > +#include "llvm/IR/Intrinsics.h"
> > > > +#include "llvm/Pass.h"
> > > > +#include "llvm/Support/CommandLine.h"
> > > > +#include "llvm/Target/TargetLibraryInfo.h"
> > > > +#include "llvm/Transforms/Utils/BasicBlockUtils.h"
> > > > +
> > > > +using namespace llvm;
> > > > +
> > > > +static cl::opt<bool>
> > > > DisableOpt("disable-mips-math-optimization",
> > > > + cl::init(false),
> > > > + cl::desc("MIPS: Disable math lib
> > > > call "
> > > > + "optimization."),
> > > > cl::Hidden);
> > > > +
> > > > +namespace {
> > > > + class MipsOptimizeMathLibCalls : public FunctionPass {
> > > > + public:
> > > > + static char ID;
> > > > +
> > > > + MipsOptimizeMathLibCalls(MipsTargetMachine &TM_) :
> > > > + FunctionPass(ID), TM(TM_) {}
> > > > +
> > > > + virtual const char *getPassName() const {
> > > > + return "MIPS: Optimize calls to math library functions.";
> > > > + }
> > > > +
> > > > + virtual void getAnalysisUsage(AnalysisUsage &AU) const;
> > > > +
> > > > + virtual bool runOnFunction(Function &F);
> > > > +
> > > > + private:
> > > > + /// Optimize calls to sqrt.
> > > > + bool optimizeSQRT(CallInst *Call, Function *CalledFunc,
> > > > + BasicBlock &CurrBB,
> > > > + Function::iterator &BB);
> > > > +
> > > > + const TargetMachine &TM;
> > > > + };
> > > > +
> > > > + char MipsOptimizeMathLibCalls::ID = 0;
> > > > +}
> > > > +
> > > > +FunctionPass
> > > > *llvm::createMipsOptimizeMathLibCalls(MipsTargetMachine
> > > > &TM) {
> > > > + return new MipsOptimizeMathLibCalls(TM);
> > > > +}
> > > > +
> > > > +void MipsOptimizeMathLibCalls::getAnalysisUsage(AnalysisUsage
> > > > &AU)
> > > > const {
> > > > + AU.addRequired<TargetLibraryInfo>();
> > > > + FunctionPass::getAnalysisUsage(AU);
> > > > +}
> > > > +
> > > > +bool MipsOptimizeMathLibCalls::runOnFunction(Function &F) {
> > > > + if (DisableOpt)
> > > > + return false;
> > > > +
> > > > + const MipsSubtarget &Subtarget =
> > > > TM.getSubtarget<MipsSubtarget>();
> > > > +
> > > > + if (Subtarget.inMips16Mode())
> > > > + return false;
> > > > +
> > > > + bool Changed = false;
> > > > + Function::iterator CurrBB;
> > > > + const TargetLibraryInfo *LibInfo =
> > > > &getAnalysis<TargetLibraryInfo>();
> > > > +
> > > > + for (Function::iterator BB = F.begin(), BE = F.end(); BB !=
> > > > BE;)
> > > > {
> > > > + CurrBB = BB++;
> > > > +
> > > > + for (BasicBlock::iterator II = CurrBB->begin(), IE =
> > > > CurrBB->end();
> > > > + II != IE; ++II) {
> > > > + CallInst *Call = dyn_cast<CallInst>(&*II);
> > > > + Function *CalledFunc;
> > > > +
> > > > + if (!Call || !(CalledFunc = Call->getCalledFunction()))
> > > > + continue;
> > > > +
> > > > + LibFunc::Func LibFunc;
> > > > + Attribute A = CalledFunc->getAttributes()
> > > > + .getAttribute(AttributeSet::FunctionIndex,
> > > > "use-soft-float");
> > > > +
> > > > + // Skip if function has "use-soft-float" attribute.
> > > > + if ((A.isStringAttribute() && (A.getValueAsString() ==
> > > > "true")) ||
> > > > + TM.Options.UseSoftFloat)
> > > > + continue;
> > > > +
> > > > + // Skip if function either has local linkage or is not a known
> > > > library
> > > > + // function.
> > > > + if (CalledFunc->hasLocalLinkage() || !CalledFunc->hasName() ||
> > > > + !LibInfo->getLibFunc(CalledFunc->getName(), LibFunc))
> > > > + continue;
> > > > +
> > > > + switch (LibFunc) {
> > > > + case LibFunc::sqrtf:
> > > > + case LibFunc::sqrt:
> > > > + if (optimizeSQRT(Call, CalledFunc, *CurrBB, BB))
> > > > + break;
> > > > + continue;
> > > > + default:
> > > > + continue;
> > > > + }
> > > > +
> > > > + Changed = true;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + return Changed;
> > > > +}
> > > > +
> > > > +bool MipsOptimizeMathLibCalls::optimizeSQRT(CallInst *Call,
> > > > + Function *CalledFunc,
> > > > + BasicBlock &CurrBB,
> > > > + Function::iterator &BB)
> > > > {
> > > > + // There is no need to change the IR, since backend will emit
> > > > sqrt
> > > > + // instruction if the call has already been marked read-only.
> > > > + if (Call->onlyReadsMemory())
> > > > + return false;
> > > > +
> > > > + // Do the following transformation:
> > > > + //
> > > > + // (before)
> > > > + // dst = sqrt(src)
> > > > + //
> > > > + // (after)
> > > > + // v0 = sqrt_noreadmem(src) # native sqrt instruction.
> > > > + // if (v0 is a NaN)
> > > > + // v1 = sqrt(src) # library call.
> > > > + // dst = phi(v0, v1)
> > > > + //
> > > > +
> > > > + // Move all instructions following Call to newly created block
> > > > JoinBB.
> > > > + // Create phi and replace all uses.
> > > > + BasicBlock *JoinBB = llvm::SplitBlock(&CurrBB,
> > > > Call->getNextNode(), this);
> > > > + IRBuilder<> Builder(JoinBB, JoinBB->begin());
> > > > + PHINode *Phi = Builder.CreatePHI(Call->getType(), 2);
> > > > + Call->replaceAllUsesWith(Phi);
> > > > +
> > > > + // Create basic block LibCallBB and insert a call to library
> > > > function sqrt.
> > > > + BasicBlock *LibCallBB = BasicBlock::Create(CurrBB.getContext(),
> > > > "call.sqrt",
> > > > + CurrBB.getParent(),
> > > > JoinBB);
> > > > + Builder.SetInsertPoint(LibCallBB);
> > > > + Instruction *LibCall = Call->clone();
> > > > + Builder.Insert(LibCall);
> > > > + Builder.CreateBr(JoinBB);
> > > > +
> > > > + // Add attribute "readnone" so that backend can use a native
> > > > sqrt
> > > > instruction
> > > > + // for this call. Insert a FP compare instruction and a
> > > > conditional branch
> > > > + // at the end of CurrBB.
> > > > + Call->addAttribute(AttributeSet::FunctionIndex,
> > > > Attribute::ReadNone);
> > > > + CurrBB.getTerminator()->eraseFromParent();
> > > > + Builder.SetInsertPoint(&CurrBB);
> > > > + Value *FCmp = Builder.CreateFCmpOEQ(Call, Call);
> > > > + Builder.CreateCondBr(FCmp, JoinBB, LibCallBB);
> > > > +
> > > > + // Add phi operands.
> > > > + Phi->addIncoming(Call, &CurrBB);
> > > > + Phi->addIncoming(LibCall, LibCallBB);
> > > > +
> > > > + BB = JoinBB;
> > > > + return true;
> > > > +}
> > > >
> > > > Modified: llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp
> > > > URL:
> > > >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp?rev=183802&r1=183801&r2=183802&view=diff
> > > >
> ==============================================================================
> > > > --- llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp (original)
> > > > +++ llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp Tue Jun 11
> > > > 17:21:44 2013
> > > > @@ -160,6 +160,7 @@ void MipsPassConfig::addIRPasses() {
> > > > addPass(createMipsOs16(getMipsTargetMachine()));
> > > > if (getMipsSubtarget().inMips16HardFloat())
> > > > addPass(createMips16HardFloat(getMipsTargetMachine()));
> > > > +
> > > > addPass(createMipsOptimizeMathLibCalls(getMipsTargetMachine()));
> > > > }
> > > > // Install an instruction selector pass using
> > > > // the ISelDag to gen Mips code.
> > > >
> > > > Added: llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll
> > > > URL:
> > > >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll?rev=183802&view=auto
> > > >
> ==============================================================================
> > > > --- llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll (added)
> > > > +++ llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll Tue Jun 11
> > > > 17:21:44 2013
> > > > @@ -0,0 +1,32 @@
> > > > +; RUN: llc -march=mipsel < %s | FileCheck %s -check-prefix=32
> > > > +; RUN: llc -march=mips64el -mcpu=mips64 < %s | FileCheck %s
> > > > -check-prefix=64
> > > > +
> > > > +; 32: test_sqrtf_float_:
> > > > +; 32: sqrt.s $f[[R0:[0-9]+]], $f{{[0-9]+}}
> > > > +; 32: c.un.s $f[[R0]], $f[[R0]]
> > > > +; 64: test_sqrtf_float_:
> > > > +; 64: sqrt.s $f[[R0:[0-9]+]], $f{{[0-9]+}}
> > > > +; 64: c.un.s $f[[R0]], $f[[R0]]
> > > > +
> > > > +define float @test_sqrtf_float_(float %a) {
> > > > +entry:
> > > > + %call = tail call float @sqrtf(float %a)
> > > > + ret float %call
> > > > +}
> > > > +
> > > > +declare float @sqrtf(float)
> > > > +
> > > > +; 32: test_sqrt_double_:
> > > > +; 32: sqrt.d $f[[R0:[0-9]+]], $f{{[0-9]+}}
> > > > +; 32: c.un.d $f[[R0]], $f[[R0]]
> > > > +; 64: test_sqrt_double_:
> > > > +; 64: sqrt.d $f[[R0:[0-9]+]], $f{{[0-9]+}}
> > > > +; 64: c.un.d $f[[R0]], $f[[R0]]
> > > > +
> > > > +define double @test_sqrt_double_(double %a) {
> > > > +entry:
> > > > + %call = tail call double @sqrt(double %a)
> > > > + ret double %call
> > > > +}
> > > > +
> > > > +declare double @sqrt(double)
> > > >
> > > >
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130612/6c5eee38/attachment.html>
-------------- next part --------------
commit a24963811d8446d08b98715cd6e1447621e32a4a
Author: Akira Hatanaka <ahatanaka at mips.com>
Date:   Wed Jun 12 17:21:19 2013 -0700

    optimize sqrt

diff --git a/lib/CodeGen/OptimizeMathLibCalls.cpp b/lib/CodeGen/OptimizeMathLibCalls.cpp
new file mode 100644
index 0000000..7b56505
--- /dev/null
+++ b/lib/CodeGen/OptimizeMathLibCalls.cpp
@@ -0,0 +1,240 @@
+//===---- OptimizeMathLibCalls.cpp - Optimize math lib calls.      ----===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass does an IR transformation which enables the backend to emit native
+// math instructions.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Target/TargetLibraryInfo.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+static cl::opt<bool> EnableOpt("enable-math-optimization",
+                                cl::init(false),
+                                cl::desc("Enable math lib call "
+                                         "optimization."), cl::Hidden);
+
+static cl::opt<bool> CheckNeg("check-neg-fp",
+                              cl::init(false),
+                              cl::desc("Check if input is negative."),
+                              cl::Hidden);
+
+namespace {
+  class OptimizeMathLibCalls : public FunctionPass {
+  public:
+    static char ID;
+
+    OptimizeMathLibCalls(TargetMachine &TM_) : FunctionPass(ID), TM(TM_) {}
+
+    virtual void getAnalysisUsage(AnalysisUsage &AU) const;
+
+    virtual bool runOnFunction(Function &F);
+
+  private:
+    /// Optimize calls to sqrt.
+    bool optimizeSQRT(CallInst *Call, Function *CalledFunc,
+                      BasicBlock &CurrBB,
+                      Function::iterator &BB);
+
+    bool optimizeSQRT_CheckNeg(CallInst *Call, BasicBlock &CurrBB,
+                               Function::iterator &BB);
+
+    const TargetMachine &TM;
+  };
+  char OptimizeMathLibCalls::ID = 0;
+}
+
+FunctionPass *llvm::createOptimizeMathLibCalls(TargetMachine &TM) {
+  return new OptimizeMathLibCalls(TM);
+}
+
+void OptimizeMathLibCalls::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired<TargetLibraryInfo>();
+  FunctionPass::getAnalysisUsage(AU);
+}
+
+bool OptimizeMathLibCalls::runOnFunction(Function &F) {
+  if (!EnableOpt)
+    return false;
+
+  bool Changed = false;
+  Function::iterator CurrBB;
+  const TargetLibraryInfo *LibInfo = &getAnalysis<TargetLibraryInfo>();
+
+  for (Function::iterator BB = F.begin(), BE = F.end(); BB != BE;) {
+    CurrBB = BB++;
+
+    for (BasicBlock::iterator II = CurrBB->begin(), IE = CurrBB->end();
+         II != IE; ++II) {
+      CallInst *Call = dyn_cast<CallInst>(&*II);
+      Function *CalledFunc;
+
+      if (!Call || !(CalledFunc = Call->getCalledFunction()))
+        continue;
+
+      LibFunc::Func LibFunc;
+      Intrinsic::ID ID;
+      Attribute A = CalledFunc->getAttributes()
+        .getAttribute(AttributeSet::FunctionIndex, "use-soft-float");
+
+      // Skip if function has "use-soft-float" attribute.
+      if ((A.isStringAttribute() && (A.getValueAsString() == "true")) ||
+          TM.Options.UseSoftFloat)
+        continue;
+
+      // Skip if function either has local linkage or is not a known library
+      // function.
+      if (CalledFunc->hasLocalLinkage() || !CalledFunc->hasName() ||
+          !LibInfo->getLibFunc(CalledFunc->getName(), LibFunc))
+        continue;
+
+      switch (LibFunc) {
+      case LibFunc::sqrtf:
+      case LibFunc::sqrt:
+        if (optimizeSQRT(Call, CalledFunc, *CurrBB, BB))
+          break;
+        continue;
+      default:
+        continue;
+      }
+
+      Changed = true;
+      break;
+    }
+  }
+
+  return Changed;
+}
+
+bool OptimizeMathLibCalls::optimizeSQRT_CheckNeg(CallInst *Call,
+                                                     BasicBlock &CurrBB,
+                                                     Function::iterator &BB) {
+  // Do the following transformation:
+  //
+  // (before)
+  // dst = sqrt(src)
+  //
+  // (after)
+  // if (src >= 0.0)
+  //   v0 = sqrt_noreadmem(src) # native sqrt instruction.
+  // else
+  //   v1 = sqrt(src)         # library call.
+  // dst = phi(v0, v1)
+  //
+
+  // Move all instructions following Call to newly created block JoinBB.
+  // Create phi and replace all uses.
+  LLVMContext &Ctx = CurrBB.getContext();
+  BasicBlock *JoinBB = llvm::SplitBlock(&CurrBB, Call->getNextNode(), this);
+  IRBuilder<> Builder(JoinBB, JoinBB->begin());
+  PHINode *Phi = Builder.CreatePHI(Call->getType(), 2);
+  Call->replaceAllUsesWith(Phi);
+
+  // Create basic block LibCallBB and insert a call to library function sqrt.
+  BasicBlock *LibCallBB = BasicBlock::Create(Ctx, "call.sqrt",
+                                             CurrBB.getParent(), JoinBB);
+  Builder.SetInsertPoint(LibCallBB);
+  Instruction *LibCall = Call->clone();
+  Builder.Insert(LibCall);
+  Builder.CreateBr(JoinBB);
+
+  // Create basic block NativeBB and insert Call.
+  BasicBlock *NativeBB = BasicBlock::Create(Ctx, "native.sqrt",
+                                            CurrBB.getParent(), JoinBB);
+  Builder.SetInsertPoint(NativeBB);
+  Call->removeFromParent();
+  Builder.Insert(Call);
+  Call->addAttribute(AttributeSet::FunctionIndex, Attribute::ReadNone);
+  Builder.CreateBr(JoinBB);
+
+  // Check if input >= 0.
+  Value *Arg = Call->getArgOperand(0);
+  Type *Ty = Call->getType();
+  APFloat Zero(APFloat::IEEEsingle, BitsToFloat(0));
+
+  if (Ty->isDoubleTy())
+    Zero = APFloat(APFloat::IEEEdouble, (double)Zero.convertToFloat());
+
+  CurrBB.getTerminator()->eraseFromParent();
+  Builder.SetInsertPoint(&CurrBB);
+  Value *FCmp = Builder.CreateFCmpOGE(Arg, ConstantFP::get(Ctx, Zero));
+  Builder.CreateCondBr(FCmp, NativeBB, LibCallBB);
+
+  // Add phi operands.
+  Phi->addIncoming(Call, NativeBB);
+  Phi->addIncoming(LibCall, LibCallBB);
+
+  BB = JoinBB;
+  return true;
+}
+
+bool OptimizeMathLibCalls::optimizeSQRT(CallInst *Call,
+                                            Function *CalledFunc,
+                                            BasicBlock &CurrBB,
+                                            Function::iterator &BB) {
+  // There is no need to change the IR, since backend will emit sqrt
+  // instruction if the call has already been marked read-only.
+  if (Call->onlyReadsMemory())
+    return false;
+
+  if (CheckNeg)
+    return optimizeSQRT_CheckNeg(Call, CurrBB, BB);
+
+  // Do the following transformation:
+  //
+  // (before)
+  // dst = sqrt(src)
+  //
+  // (after)
+  // v0 = sqrt_noreadmem(src) # native sqrt instruction.
+  // if (v0 is a NaN)
+  //   v1 = sqrt(src)         # library call.
+  // dst = phi(v0, v1)
+  //
+
+  // Move all instructions following Call to newly created block JoinBB.
+  // Create phi and replace all uses.
+  BasicBlock *JoinBB = llvm::SplitBlock(&CurrBB, Call->getNextNode(), this);
+  IRBuilder<> Builder(JoinBB, JoinBB->begin());
+  PHINode *Phi = Builder.CreatePHI(Call->getType(), 2);
+  Call->replaceAllUsesWith(Phi);
+
+  // Create basic block LibCallBB and insert a call to library function sqrt.
+  BasicBlock *LibCallBB = BasicBlock::Create(CurrBB.getContext(), "call.sqrt",
+                                             CurrBB.getParent(), JoinBB);
+  Builder.SetInsertPoint(LibCallBB);
+  Instruction *LibCall = Call->clone();
+  Builder.Insert(LibCall);
+  Builder.CreateBr(JoinBB);
+
+  // Add attribute "readnone" so that backend can use a native sqrt instruction
+  // for this call. Insert a FP compare instruction and a conditional branch
+  // at the end of CurrBB.
+  Call->addAttribute(AttributeSet::FunctionIndex, Attribute::ReadNone);
+  CurrBB.getTerminator()->eraseFromParent();
+  Builder.SetInsertPoint(&CurrBB);
+  Value *FCmp = Builder.CreateFCmpOEQ(Call, Call);
+  Builder.CreateCondBr(FCmp, JoinBB, LibCallBB);
+
+  // Add phi operands.
+  Phi->addIncoming(Call, &CurrBB);
+  Phi->addIncoming(LibCall, LibCallBB);
+
+  BB = JoinBB;
+  return true;
+}
+