[llvm] r183802 - [mips] Add an IR transformation pass that optimizes calls to sqrt.

Wed Jun 12 17:53:04 PDT 2013

One more patch attached.


On Wed, Jun 12, 2013 at 5:49 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:

> When I run my test program with a negative input, the latter approach is
> about 10% faster.
> The former approach is faster when the input is positive, but the
> difference is smaller (3-4%).
>
> I created a patch which moves this optimization to lib/CodeGen. You can
> try optimizing sqrt for other targets with these commands:
>
> $  llc test-sqrt2.ll -o - -enable-math-optimization (this is for checking
> nan)
> $  llc test-sqrt2.ll -o - -enable-math-optimization -check-neg-fp (this is
> for checking negative)
>
> Note that in order to emit a native sqrt instruction, the target needs a
> fsqrt pattern.
> I tried this pass on x86, but didn't see a large difference between the
> two approaches.
>
> On Tue, Jun 11, 2013 at 10:27 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>> ----- Original Message -----
>> >
>> > I ran a simple experiment to compare the two approaches. I don't have
>> > the exact numbers yet, but the approach in this patch is slightly
>> > faster than the approach that checks whether the input is negative,
>> > when the input is always non-negative. The difference is not
>> > significant, but is still measurable. One thing I noticed studying
>> > the generated code is that mips needs extra instructions (either a
>> > load or copy from register $0) and an extra register to load
>> > constant 0.0, which might explain the difference.
>> >
>> >
>> > When the input is negative, the latter approach was faster as
>> > expected.
>>
>> Just a little faster, or ~2x faster? I think that this makes a big
>> difference. As a user, I've been bitten by this kind of transformation
>> before, and find it quite annoying when applications exhibit huge slowdowns
>> when NaNs start being generated.
>>
>> Thanks again,
>> Hal
>>
>> >
>> > I'll continue looking into this.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 11, 2013 at 4:13 PM, Hal Finkel < hfinkel at anl.gov >
>> > wrote:
>> >
>> >
>> >
>> > ----- Original Message -----
>> > >
>> > >
>> > > I was simply copying what gcc was doing. As you've pointed out, you
>> > > can check if the input is negative first. The difference is that
>> > > you'll have to emit one more branch (not sure how much difference
>> > > this will make to performance):
>> >
>> > I think this is worth testing. I'd guess that, assuming you arrange
>> > the branches so that it defaults to calling the native instruction,
>> > the cost of the extra branch will be small, and you won't have the
>> > >2x runtime increase on negative inputs.
>> >
>> > -Hal
>> >
>> >
>> >
>> > >
>> > >
>> > > if (input is negative)
>> > >
>> > > libcall
>> > > else
>> > > native instruction
>> > >
>> > >
>> > >
>> > >
>> > > (transformation in this patch):
>> > >
>> > >
>> > > native_instruction
>> > >
>> > >
>> > >
>> > >
>> > > if (result is nan)
>> > > libcall
>> > >
>> > >
>> > > I made this pass mips-specific because I was planning to commit
>> > > another patch which does a mips-specific optimization to calls to
>> > > other math library calls (ceil, trunc and trunc), but there isn't
>> > > any reason the transformation in this patch can't be target
>> > > independent.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Jun 11, 2013 at 3:45 PM, Hal Finkel < hfinkel at anl.gov >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > ----- Original Message -----
>> > > > Author: ahatanak
>> > > > Date: Tue Jun 11 17:21:44 2013
>> > > > New Revision: 183802
>> > > >
>> > > > URL: http://llvm.org/viewvc/llvm-project?rev=183802&view=rev
>> > > > Log:
>> > > > [mips] Add an IR transformation pass that optimizes calls to
>> > > > sqrt.
>> > > >
>> > > > The pass emits a call to sqrt that has attribute "read-none".
>> > > > This
>> > > > call will be
>> > > > converted to an ISD::FSQRT node during DAG construction, which
>> > > > will
>> > > > turn into
>> > > > a mips native sqrt instruction.
>> > >
>> > > This seems almost completely target independent, is there a reason
>> > > to
>> > > make this MIPS-specific? Also, is it really better to test the
>> > > result for NaN as opposed to checking whether the input is
>> > > negative?
>> > >
>> > > -Hal
>> > >
>> > >
>> > >
>> > > >
>> > > >
>> > > > Added:
>> > > > llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
>> > > > llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll
>> > > > Modified:
>> > > > llvm/trunk/lib/Target/Mips/Mips.h
>> > > > llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp
>> > > >
>> > > > Modified: llvm/trunk/lib/Target/Mips/Mips.h
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/Mips.h?rev=183802&r1=183801&r2=183802&view=diff
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/lib/Target/Mips/Mips.h (original)
>> > > > +++ llvm/trunk/lib/Target/Mips/Mips.h Tue Jun 11 17:21:44 2013
>> > > > @@ -28,7 +28,7 @@ namespace llvm {
>> > > > FunctionPass *createMipsJITCodeEmitterPass(MipsTargetMachine &TM,
>> > > > JITCodeEmitter &JCE);
>> > > > FunctionPass *createMipsConstantIslandPass(MipsTargetMachine
>> > > > &tm);
>> > > > -
>> > > > + FunctionPass *createMipsOptimizeMathLibCalls(MipsTargetMachine
>> > > > &TM);
>> > > > } // end namespace llvm;
>> > > >
>> > > > #endif
>> > > >
>> > > > Added: llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp?rev=183802&view=auto
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp
>> > > > (added)
>> > > > +++ llvm/trunk/lib/Target/Mips/MipsOptimizeMathLibCalls.cpp Tue
>> > > > Jun
>> > > > 11 17:21:44 2013
>> > > > @@ -0,0 +1,175 @@
>> > > > +//===---- MipsOptimizeMathLibCalls.cpp - Optimize math lib
>> > > > calls.
>> > > > ----===//
>> > > > +//
>> > > > +// The LLVM Compiler Infrastructure
>> > > > +//
>> > > > +// This file is distributed under the University of Illinois
>> > > > Open
>> > > > Source
>> > > > +// License. See LICENSE.TXT for details.
>> > > > +//
>> > > >
>> +//===----------------------------------------------------------------------===//
>> > > > +//
>> > > > +// This pass does an IR transformation which enables the backend
>> > > > to
>> > > > emit native
>> > > > +// math instructions.
>> > > > +//
>> > > >
>> +//===----------------------------------------------------------------------===//
>> > > > +
>> > > > +#include "MipsTargetMachine.h"
>> > > > +#include "llvm/IR/IRBuilder.h"
>> > > > +#include "llvm/IR/Intrinsics.h"
>> > > > +#include "llvm/Pass.h"
>> > > > +#include "llvm/Support/CommandLine.h"
>> > > > +#include "llvm/Target/TargetLibraryInfo.h"
>> > > > +#include "llvm/Transforms/Utils/BasicBlockUtils.h"
>> > > > +
>> > > > +using namespace llvm;
>> > > > +
>> > > > +static cl::opt<bool>
>> > > > DisableOpt("disable-mips-math-optimization",
>> > > > + cl::init(false),
>> > > > + cl::desc("MIPS: Disable math lib
>> > > > call "
>> > > > + "optimization."),
>> > > > cl::Hidden);
>> > > > +
>> > > > +namespace {
>> > > > + class MipsOptimizeMathLibCalls : public FunctionPass {
>> > > > + public:
>> > > > + static char ID;
>> > > > +
>> > > > + MipsOptimizeMathLibCalls(MipsTargetMachine &TM_) :
>> > > > + FunctionPass(ID), TM(TM_) {}
>> > > > +
>> > > > + virtual const char *getPassName() const {
>> > > > + return "MIPS: Optimize calls to math library functions.";
>> > > > + }
>> > > > +
>> > > > + virtual void getAnalysisUsage(AnalysisUsage &AU) const;
>> > > > +
>> > > > + virtual bool runOnFunction(Function &F);
>> > > > +
>> > > > + private:
>> > > > + /// Optimize calls to sqrt.
>> > > > + bool optimizeSQRT(CallInst *Call, Function *CalledFunc,
>> > > > + BasicBlock &CurrBB,
>> > > > + Function::iterator &BB);
>> > > > +
>> > > > + const TargetMachine &TM;
>> > > > + };
>> > > > +
>> > > > + char MipsOptimizeMathLibCalls::ID = 0;
>> > > > +}
>> > > > +
>> > > > +FunctionPass
>> > > > *llvm::createMipsOptimizeMathLibCalls(MipsTargetMachine
>> > > > &TM) {
>> > > > + return new MipsOptimizeMathLibCalls(TM);
>> > > > +}
>> > > > +
>> > > > +void MipsOptimizeMathLibCalls::getAnalysisUsage(AnalysisUsage
>> > > > &AU)
>> > > > const {
>> > > > + AU.addRequired<TargetLibraryInfo>();
>> > > > + FunctionPass::getAnalysisUsage(AU);
>> > > > +}
>> > > > +
>> > > > +bool MipsOptimizeMathLibCalls::runOnFunction(Function &F) {
>> > > > + if (DisableOpt)
>> > > > + return false;
>> > > > +
>> > > > + const MipsSubtarget &Subtarget =
>> > > > TM.getSubtarget<MipsSubtarget>();
>> > > > +
>> > > > + if (Subtarget.inMips16Mode())
>> > > > + return false;
>> > > > +
>> > > > + bool Changed = false;
>> > > > + Function::iterator CurrBB;
>> > > > + const TargetLibraryInfo *LibInfo =
>> > > > &getAnalysis<TargetLibraryInfo>();
>> > > > +
>> > > > + for (Function::iterator BB = F.begin(), BE = F.end(); BB !=
>> > > > BE;)
>> > > > {
>> > > > + CurrBB = BB++;
>> > > > +
>> > > > + for (BasicBlock::iterator II = CurrBB->begin(), IE =
>> > > > CurrBB->end();
>> > > > + II != IE; ++II) {
>> > > > + CallInst *Call = dyn_cast<CallInst>(&*II);
>> > > > + Function *CalledFunc;
>> > > > +
>> > > > + if (!Call || !(CalledFunc = Call->getCalledFunction()))
>> > > > + continue;
>> > > > +
>> > > > + LibFunc::Func LibFunc;
>> > > > + Attribute A = CalledFunc->getAttributes()
>> > > > + .getAttribute(AttributeSet::FunctionIndex,
>> > > > "use-soft-float");
>> > > > +
>> > > > + // Skip if function has "use-soft-float" attribute.
>> > > > + if ((A.isStringAttribute() && (A.getValueAsString() ==
>> > > > "true")) ||
>> > > > + TM.Options.UseSoftFloat)
>> > > > + continue;
>> > > > +
>> > > > + // Skip if function either has local linkage or is not a known
>> > > > library
>> > > > + // function.
>> > > > + if (CalledFunc->hasLocalLinkage() || !CalledFunc->hasName() ||
>> > > > + !LibInfo->getLibFunc(CalledFunc->getName(), LibFunc))
>> > > > + continue;
>> > > > +
>> > > > + switch (LibFunc) {
>> > > > + case LibFunc::sqrtf:
>> > > > + case LibFunc::sqrt:
>> > > > + if (optimizeSQRT(Call, CalledFunc, *CurrBB, BB))
>> > > > + break;
>> > > > + continue;
>> > > > + default:
>> > > > + continue;
>> > > > + }
>> > > > +
>> > > > + Changed = true;
>> > > > + break;
>> > > > + }
>> > > > + }
>> > > > +
>> > > > + return Changed;
>> > > > +}
>> > > > +
>> > > > +bool MipsOptimizeMathLibCalls::optimizeSQRT(CallInst *Call,
>> > > > + Function *CalledFunc,
>> > > > + BasicBlock &CurrBB,
>> > > > + Function::iterator &BB)
>> > > > {
>> > > > + // There is no need to change the IR, since backend will emit
>> > > > sqrt
>> > > > + // instruction if the call has already been marked read-only.
>> > > > + if (Call->onlyReadsMemory())
>> > > > + return false;
>> > > > +
>> > > > + // Do the following transformation:
>> > > > + //
>> > > > + // (before)
>> > > > + // dst = sqrt(src)
>> > > > + //
>> > > > + // (after)
>> > > > + // v0 = sqrt_noreadmem(src) # native sqrt instruction.
>> > > > + // if (v0 is a NaN)
>> > > > + // v1 = sqrt(src) # library call.
>> > > > + // dst = phi(v0, v1)
>> > > > + //
>> > > > +
>> > > > + // Move all instructions following Call to newly created block
>> > > > JoinBB.
>> > > > + // Create phi and replace all uses.
>> > > > + BasicBlock *JoinBB = llvm::SplitBlock(&CurrBB,
>> > > > Call->getNextNode(), this);
>> > > > + IRBuilder<> Builder(JoinBB, JoinBB->begin());
>> > > > + PHINode *Phi = Builder.CreatePHI(Call->getType(), 2);
>> > > > + Call->replaceAllUsesWith(Phi);
>> > > > +
>> > > > + // Create basic block LibCallBB and insert a call to library
>> > > > function sqrt.
>> > > > + BasicBlock *LibCallBB = BasicBlock::Create(CurrBB.getContext(),
>> > > > "call.sqrt",
>> > > > + CurrBB.getParent(),
>> > > > JoinBB);
>> > > > + Builder.SetInsertPoint(LibCallBB);
>> > > > + Instruction *LibCall = Call->clone();
>> > > > + Builder.Insert(LibCall);
>> > > > + Builder.CreateBr(JoinBB);
>> > > > +
>> > > > + // Add attribute "readnone" so that backend can use a native
>> > > > sqrt
>> > > > instruction
>> > > > + // for this call. Insert a FP compare instruction and a
>> > > > conditional branch
>> > > > + // at the end of CurrBB.
>> > > > + Call->addAttribute(AttributeSet::FunctionIndex,
>> > > > Attribute::ReadNone);
>> > > > + CurrBB.getTerminator()->eraseFromParent();
>> > > > + Builder.SetInsertPoint(&CurrBB);
>> > > > + Value *FCmp = Builder.CreateFCmpOEQ(Call, Call);
>> > > > + Builder.CreateCondBr(FCmp, JoinBB, LibCallBB);
>> > > > +
>> > > > + // Add phi operands.
>> > > > + Phi->addIncoming(Call, &CurrBB);
>> > > > + Phi->addIncoming(LibCall, LibCallBB);
>> > > > +
>> > > > + BB = JoinBB;
>> > > > + return true;
>> > > > +}
>> > > >
>> > > > Modified: llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp?rev=183802&r1=183801&r2=183802&view=diff
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp (original)
>> > > > +++ llvm/trunk/lib/Target/Mips/MipsTargetMachine.cpp Tue Jun 11
>> > > > 17:21:44 2013
>> > > > @@ -160,6 +160,7 @@ void MipsPassConfig::addIRPasses() {
>> > > > addPass(createMipsOs16(getMipsTargetMachine()));
>> > > > if (getMipsSubtarget().inMips16HardFloat())
>> > > > addPass(createMips16HardFloat(getMipsTargetMachine()));
>> > > > +
>> > > > addPass(createMipsOptimizeMathLibCalls(getMipsTargetMachine()));
>> > > > }
>> > > > // Install an instruction selector pass using
>> > > > // the ISelDag to gen Mips code.
>> > > >
>> > > > Added: llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll
>> > > > URL:
>> > > >
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll?rev=183802&view=auto
>> > > >
>> ==============================================================================
>> > > > --- llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll (added)
>> > > > +++ llvm/trunk/test/CodeGen/Mips/optimize-fp-math.ll Tue Jun 11
>> > > > 17:21:44 2013
>> > > > @@ -0,0 +1,32 @@
>> > > > +; RUN: llc -march=mipsel < %s | FileCheck %s -check-prefix=32
>> > > > +; RUN: llc -march=mips64el -mcpu=mips64 < %s | FileCheck %s
>> > > > -check-prefix=64
>> > > > +
>> > > > +; 32: test_sqrtf_float_:
>> > > > +; 32: sqrt.s $f[[R0:[0-9]+]], $f{{[0-9]+}}
>> > > > +; 32: c.un.s $f[[R0]], $f[[R0]]
>> > > > +; 64: test_sqrtf_float_:
>> > > > +; 64: sqrt.s $f[[R0:[0-9]+]], $f{{[0-9]+}}
>> > > > +; 64: c.un.s $f[[R0]], $f[[R0]]
>> > > > +
>> > > > +define float @test_sqrtf_float_(float %a) {
>> > > > +entry:
>> > > > + %call = tail call float @sqrtf(float %a)
>> > > > + ret float %call
>> > > > +}
>> > > > +
>> > > > +declare float @sqrtf(float)
>> > > > +
>> > > > +; 32: test_sqrt_double_:
>> > > > +; 32: sqrt.d $f[[R0:[0-9]+]], $f{{[0-9]+}}
>> > > > +; 32: c.un.d $f[[R0]], $f[[R0]]
>> > > > +; 64: test_sqrt_double_:
>> > > > +; 64: sqrt.d $f[[R0:[0-9]+]], $f{{[0-9]+}}
>> > > > +; 64: c.un.d $f[[R0]], $f[[R0]]
>> > > > +
>> > > > +define double @test_sqrt_double_(double %a) {
>> > > > +entry:
>> > > > + %call = tail call double @sqrt(double %a)
>> > > > + ret double %call
>> > > > +}
>> > > > +
>> > > > +declare double @sqrt(double)
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > llvm-commits mailing list
>> > > > llvm-commits at cs.uiuc.edu
>> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> > > >
>> > > _______________________________________________
>> > > llvm-commits mailing list
>> > > llvm-commits at cs.uiuc.edu
>> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> > >
>> > >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130612/38e0ff4e/attachment.html>
-------------- next part --------------
commit fefb353cec8c2b77164fdf7e3a94964dda209e48
Author: Akira Hatanaka <ahatanaka at mips.com>
Date:   Wed Jun 12 17:51:52 2013 -0700

    optimize sqrt

diff --git a/include/llvm/CodeGen/Passes.h b/include/llvm/CodeGen/Passes.h
index 7ec90ae..d6b88dc 100644
--- a/include/llvm/CodeGen/Passes.h
+++ b/include/llvm/CodeGen/Passes.h
@@ -345,6 +345,8 @@ namespace llvm {
   createMachineFunctionPrinterPass(raw_ostream &OS,
                                    const std::string &Banner ="");
 
+  FunctionPass *createOptimizeMathLibCalls(TargetMachine &TM);
+
   /// MachineLoopInfo - This pass is a loop analysis pass.
   extern char &MachineLoopInfoID;
 
diff --git a/lib/CodeGen/Passes.cpp b/lib/CodeGen/Passes.cpp
index 1a6b62b..576cd32 100644
--- a/lib/CodeGen/Passes.cpp
+++ b/lib/CodeGen/Passes.cpp
@@ -383,6 +383,7 @@ void TargetPassConfig::addIRPasses() {
 
   // Make sure that no unreachable blocks are instruction selected.
   addPass(createUnreachableBlockEliminationPass());
+  addPass(llvm::createOptimizeMathLibCalls(*TM));
 }
 
 /// Turn exception handling constructs into something the code generators can