[llvm-commits] [llvm] r168931 - in /llvm/trunk: include/llvm/Target/TargetTransformImpl.h include/llvm/TargetTransformInfo.h lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86TargetMachine.h lib/Transforms/Scalar/LoopIdiomRecognize.cpp test/Transforms/LoopIdiom/popcnt.ll

Fri Dec 7 16:58:00 PST 2012

Hey,

This patch causes LLVM to segfault on certain loops that look almost like
popcount... I don't have a reduced test case but it should be very easy to
write one. However, there are pretty pervasive problems with this patch. I
think combined with the two bugs I spotted by inspection, the failure
Galina posted, and the numerous style problems, you should probably revert
until it's in better shape.

I've provided as detailed of comments as I could below to try to help you
improve this. The bugs are toward the end, and I'm afraid I ran out of
steam toward the end in terms of reviewing as well.

On Thu, Nov 29, 2012 at 11:38 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> Modified: llvm/trunk/include/llvm/Target/TargetTransformImpl.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetTransformImpl.h?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm/Target/TargetTransformImpl.h (original)
> +++ llvm/trunk/include/llvm/Target/TargetTransformImpl.h Thu Nov 29
> 13:38:54 2012
> @@ -26,7 +26,7 @@
>  /// ScalarTargetTransformInfo interface. Different targets can implement
>  /// this interface differently.
>  class ScalarTargetTransformImpl : public ScalarTargetTransformInfo {
> -private:
> +protected:
>    const TargetLowering *TLI;
>
>  public:
>
> Modified: llvm/trunk/include/llvm/TargetTransformInfo.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/TargetTransformInfo.h?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm/TargetTransformInfo.h (original)
> +++ llvm/trunk/include/llvm/TargetTransformInfo.h Thu Nov 29 13:38:54 2012
> @@ -75,6 +75,18 @@
>  /// LSR, and LowerInvoke use this interface.
>  class ScalarTargetTransformInfo {
>  public:
> +  /// PopcntHwSupport - Hardware support for population count. Compared
> to the
> +  /// SW implementation, HW support is supposed to significantly boost the
> +  /// performance when the population is dense, and it may or not may
> degrade
>

"it may or may not" rather than "it may or not may".

> +  /// performance if the population is sparse. A HW support is considered
> as
> +  /// "Fast" if it can outperform, or is on a par with, SW implementaion
> when
> +  /// the population is sparse; otherwise, it is considered as "Slow".
>

I wonder, why not express this in the interface? You could query for
fast/slow with an expected density?

Also, some of this comment might be more naturally structured on the actual
method rather than on the return type.

> +  enum PopcntHwSupport {
> +    None,
> +    Fast,
> +    Slow
> +  };
>

Please follow the coding standards for naming enum types and enumerators.

> +
>    virtual ~ScalarTargetTransformInfo() {}
>
>    /// isLegalAddImmediate - Return true if the specified immediate is
> legal
> @@ -122,6 +134,11 @@
>    virtual bool shouldBuildLookupTables() const {
>      return true;
>    }
> +
> +  /// getPopcntHwSupport - Return hardware support for population count.
>

Please follow the doxygen examples in the coding standards and don't
duplicate function names in comments.

> +  virtual PopcntHwSupport getPopcntHwSupport(unsigned IntTyWidthInBit)
> const {
>

Why accept the integer type width? does it matter other than it being a
legal integer width for the target?

> +    return None;
> +  }
>  };
>
>  /// VectorTargetTransformInfo - This interface is used by the vectorizers
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Nov 29 13:38:54 2012
> @@ -17670,6 +17670,17 @@
>    return -1;
>  }
>
> +ScalarTargetTransformInfo::PopcntHwSupport
> +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned TyWidth) const {
> +  assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");

This constraint seems strange. It certainly doesn't seem like it should be
an assert.

If i have an i24, i cant count the population of it by counting the
population of it zext'ed to 32 bits...

>

+  const X86Subtarget &ST =
> TLI->getTargetMachine().getSubtarget<X86Subtarget>();
> +
> +  // TODO: Currently the __builtin_popcount() implementation using SSE3
> +  //   instructions is inefficient. Once the problem is fixed, we should
> +  //   call ST.hasSSE3() instead of ST.hasSSE4().
>

__builtin_popcount isn't the issue here, this is in LLVM. What is the
actual issue? Why isn't "Slow' used here?

Also LLVM uses FIXME in comments more than TODO.

> +  return ST.hasSSE41() ? Fast : None;
> +}
> +
>  unsigned
>  X86VectorTargetTransformInfo::getArithmeticInstrCost(unsigned Opcode,
>                                                       Type *Ty) const {
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Thu Nov 29 13:38:54 2012
> @@ -933,6 +933,14 @@
>                               const TargetLibraryInfo *libInfo);
>    }
>
> +  class X86ScalarTargetTransformImpl : public ScalarTargetTransformImpl {
> +  public:
> +    explicit X86ScalarTargetTransformImpl(const TargetLowering *TL) :
> +      ScalarTargetTransformImpl(TL) {};
> +
> +    virtual PopcntHwSupport getPopcntHwSupport(unsigned TyWidth) const;
> +  };
> +
>    class X86VectorTargetTransformInfo : public VectorTargetTransformImpl {
>    public:
>      explicit X86VectorTargetTransformInfo(const TargetLowering *TL) :
>
> Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.h?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86TargetMachine.h (original)
> +++ llvm/trunk/lib/Target/X86/X86TargetMachine.h Thu Nov 29 13:38:54 2012
> @@ -118,7 +118,7 @@
>    X86SelectionDAGInfo TSInfo;
>    X86TargetLowering TLInfo;
>    X86JITInfo        JITInfo;
> -  ScalarTargetTransformImpl STTI;
> +  X86ScalarTargetTransformImpl STTI;
>    X86VectorTargetTransformInfo VTTI;
>  public:
>    X86_64TargetMachine(const Target &T, StringRef TT,
>
> Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp?rev=168931&r1=168930&r2=168931&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Thu Nov 29
> 13:38:54 2012
> @@ -56,6 +56,7 @@
>  #include "llvm/Support/raw_ostream.h"
>  #include "llvm/DataLayout.h"
>  #include "llvm/Target/TargetLibraryInfo.h"
> +#include "llvm/TargetTransformInfo.h"
>  #include "llvm/Transforms/Utils/Local.h"
>  using namespace llvm;
>
> @@ -63,16 +64,83 @@
>  STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
>
>  namespace {
> +
> +  class LoopIdiomRecognize;
> +
> +  /// This class defines some utility functions for loop idiom
> recognization.
>

Please don't use a class to organize a collection of static methods. That
is the role of a namespace.

And this doesn't need a namespace at all -- this is in a single .cc file.
Please just use static functions, that is is the overwhelmingly predominant
pattern in LLVM.

> +  class LIRUtil {
> +  public:
> +    /// Return true iff the block contains nothing but an uncondition
> branch
> +    /// (aka goto instruction).
> +    static bool isAlmostEmpty(BasicBlock *);
>

Please use names for parameters in LLVM.

Also, I don't think this needs a helper method. First, you *always* have a
terminator instruction, the verify assures this. Typically, you would then
test that &BB->begin() == BB->getTerminatorInst().

Or is the point of this function to test for an empty BB *and* an
unconditional branch? If so, why do you need to handle that -- SimplifyCFG
should have removed such blocks already?

> +
> +    static BranchInst *getBranch(BasicBlock *BB) {
> +      return dyn_cast<BranchInst>(BB->getTerminator());
> +    }
>

Abstracting this into a method doesn't seem to help readability at all. I
would rather just see the dyn_cast.

> +
> +    /// Return the condition of the branch terminating the given basic
> block.
> +    static Value *getBrCondtion(BasicBlock *);
>

What if it doesn't terminate in a branch? I'm not really sure what
abstraction you're trying to provide here.

+
> +    /// Derive the precondition block (i.e the block that guards the loop
> +    /// preheader) from the given preheader.
> +    static BasicBlock *getPrecondBb(BasicBlock *PreHead);
>

I don't understand this.. What is the invariant it exposes? What does it
test first? Why is this not a method on LoopInfo if this is generally
useful?

> +  };
> +
> +  /// This class is to recoginize idioms of population-count conducted in
> +  /// a noncountable loop. Currently it only recognizes this pattern:
> +  /// \code
> +  ///   while(x) {cnt++; ...; x &= x - 1; ...}
> +  /// \endcode
> +  class NclPopcountRecognize {
>

If NCL is an acronym you want to use, capitalize it as an acronym. I don't
think it is a terribly useful acronym. I would call this just
PopcountLoopRecognizer (note a noun, as this is a class), and document what
sets of loops it handles currently. If there is some reason we wouldn't
re-use this infrastructure for any future extensions, you should explain
that here.

Also, have you thought much about the design of these classes? Currently,
we have LoopIdiomRecognize which is a pass, and which has innate logic for
forming memset intrinsics out of loops. But we now also have a non-pass
helper class to recognize a specific kind of loop. How would you envision
this design scaling up to the next loop idiom?

I think that either the logic for recognizing popcount should be kept
largely in methods on LoopIdiomRecognize (with helper static functions), or
both it and memset recognition should be split out into nice patterned
classes so that it is clear how to extend things.

+    LoopIdiomRecognize &LIR;
> +    Loop *CurLoop;
> +    BasicBlock *PreCondBB;
>

If this is the only common state managed across the calls to various parts
of your algorithm, I am increasingly of the opinion that you don't need an
entire helper class here.

> +
> +    typedef IRBuilder<> IRBuilderTy;
>

Why typedef this? Most everywhere just uses IRBuilder<> directly.

> +
> +  public:
> +    explicit NclPopcountRecognize(LoopIdiomRecognize &TheLIR);
>

As mentioned in other code reviews, just name this 'LIR'.

Also, when defining a small helper class, you don't need to define all of
the methods out of line. Especially trivial ones should be defined inline.

> +    bool recognize();

+
> +  private:
> +    /// Take a glimpse of the loop to see if we need to go ahead
> recoginizing
> +    /// the idiom.
> +    bool preliminaryScreen();
> +
> +    /// Check if the given conditional branch is based on the comparison
> +    /// beween a variable and zero, and if the variable is non-zero, the
> +    /// control yeilds to the loop entry. If the branch matches the
> behavior,
> +    /// the variable involved in the comparion is returned. This function
> will
> +    /// be called to see if the precondition and postcondition of the loop
> +    /// are in desirable form.
> +    Value *matchCondition (BranchInst *Br, BasicBlock *NonZeroTarget)
> const;
> +
> +    /// Return true iff the idiom is detected in the loop. and 1) \p
> CntInst
> +    /// is set to the instruction counting the pupulation bit. 2) \p
> CntPhi
> +    /// is set to the corresponding phi node. 3) \p Var is set to the
> value
> +    /// whose population bits are being counted.
> +    bool detectIdiom
> +      (Instruction *&CntInst, PHINode *&CntPhi, Value *&Var) const;
>

Please never break a line after a function name in its declaration. This is
very counterintuitive style in general, and deviates from all of the LLVM
coding conventions.

I cannot stress this enough: it would make it *much* easier to read and
understand your patches if you would spend some time learning how LLVM code
is written, and trying to write code that matches the existing conventions
and style. If  we can help by extending the coding standards, please let us
know.

> +
> +    /// Insert ctpop intrinsic function and some obviously dead
> instructions.
> +    void transform (Instruction *CntInst, PHINode *CntPhi, Value *Var);
>

No space before the '(' in a function call. This is in the standards.

> +
> +    /// Create llvm.ctpop.* intrinsic function.
> +    CallInst *createPopcntIntrinsic(IRBuilderTy &IRB, Value *Val,
> DebugLoc DL);

You shouldn't need your own routine for this, IRBuilder should do it for
you. If it's helpers aren't sufficient, you should propose patches to
enhance them.

>
> +  };
> +
>    class LoopIdiomRecognize : public LoopPass {
>      Loop *CurLoop;
>      const DataLayout *TD;
>      DominatorTree *DT;
>      ScalarEvolution *SE;
>      TargetLibraryInfo *TLI;
> +    const ScalarTargetTransformInfo *STTI;
>    public:
>      static char ID;
>      explicit LoopIdiomRecognize() : LoopPass(ID) {
>        initializeLoopIdiomRecognizePass(*PassRegistry::getPassRegistry());
> +      TD = 0; DT = 0; SE = 0; TLI = 0; STTI = 0;
>

Why was this needed? Please always use the constructor's initializer list
if you're going to set default values. And please don't write multiple
assignment statements on a single line.

>      }
>
>      bool runOnLoop(Loop *L, LPPassManager &LPM);
> @@ -110,6 +178,36 @@
>        AU.addRequired<DominatorTree>();
>        AU.addRequired<TargetLibraryInfo>();
>      }
> +
> +    const DataLayout *getDataLayout() {
> +      return TD ? TD : TD=getAnalysisIfAvailable<DataLayout>();
>

No, don't query whether the analysis is available every time you go to get
it. Look at how other passes do this. In the runOnFunction, walk through
the class members that need to be initialized for that function and
initialize them once. Then use the members everywhere rather than these
accessors.

> +    }
> +
> +    DominatorTree *getDominatorTree() {
> +      return DT ? DT : (DT=&getAnalysis<DominatorTree>());
> +    }
> +
> +    ScalarEvolution *getScalarEvolution() {
> +      return SE ? SE : (SE = &getAnalysis<ScalarEvolution>());
> +    }
> +
> +    TargetLibraryInfo *getTargetLibraryInfo() {
> +      return TLI ? TLI : (TLI = &getAnalysis<TargetLibraryInfo>());
> +    }
> +
> +    const ScalarTargetTransformInfo *getScalarTargetTransformInfo() {
> +      if (!STTI) {
> +        TargetTransformInfo *TTI =
> getAnalysisIfAvailable<TargetTransformInfo>();
> +        if (TTI) STTI = TTI->getScalarTargetTransformInfo();
> +      }
> +      return STTI;
> +    }
> +
> +    Loop *getLoop() const { return CurLoop; }
>

If all of these accessors are only so that your helper class can reach in
and use the members of the LoopIdiomRecognize pass, that is a sign that
this is a bad design.

> +
> +  private:
> +    bool runOnNoncountableLoop();
> +    bool runOnCountableLoop();
>    };
>  }
>
> @@ -172,24 +270,390 @@
>        deleteDeadInstruction(I, SE, TLI);
>  }
>
> -bool LoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {
> -  CurLoop = L;
>
> +//===----------------------------------------------------------------------===//
> +//
> +//          Implementation of LIRUtil
> +//
>
> +//===----------------------------------------------------------------------===//
> +
> +// This fucntion will return true iff the given block contains nothing
> but goto.
> +// A typical usage of this function is to check if the preheader fucntion
> is
> +// "almost" empty such that generated intrinsic function can be moved
> across
> +// preheader and to be placed at the end of the preconditiona block
> without
> +// concerning of breaking data dependence.
> +bool LIRUtil::isAlmostEmpty(BasicBlock *BB) {
> +  if (BranchInst *Br = getBranch(BB)) {
> +    return Br->isUnconditional() && BB->size() == 1;
> +  }
> +  return false;
> +}
>
> -  // If the loop could not be converted to canonical form, it must have an
> -  // indirectbr in it, just give up.
> -  if (!L->getLoopPreheader())
> +Value *LIRUtil::getBrCondtion(BasicBlock *BB) {
> +  BranchInst *Br = getBranch(BB);
> +  return Br ? Br->getCondition() : 0;
> +}
> +
> +BasicBlock *LIRUtil::getPrecondBb(BasicBlock *PreHead) {
> +  if (BasicBlock *BB = PreHead->getSinglePredecessor()) {
> +    BranchInst *Br = getBranch(BB);
> +    return Br && Br->isConditional() ? BB : 0;
> +  }
> +  return 0;
> +}
> +
>
> +//===----------------------------------------------------------------------===//
> +//
> +//          Implementation of NclPopcountRecognize
> +//
>
> +//===----------------------------------------------------------------------===//
> +
> +NclPopcountRecognize::NclPopcountRecognize(LoopIdiomRecognize &TheLIR):
> +  LIR(TheLIR), CurLoop(TheLIR.getLoop()), PreCondBB(0) {
> +}
> +
> +bool NclPopcountRecognize::preliminaryScreen() {
> +  const ScalarTargetTransformInfo *STTI =
> LIR.getScalarTargetTransformInfo();
> +  if (STTI->getPopcntHwSupport(32) != ScalarTargetTransformInfo::Fast)
>

Why 32 here? This kind of magic number is worrisome.

>      return false;
>
> -  // Disable loop idiom recognition if the function's name is a common
> idiom.
> -  StringRef Name = L->getHeader()->getParent()->getName();
> -  if (Name == "memset" || Name == "memcpy")
> +  // Counting population are usually conducted by few arithmetic
> instrutions.
> +  // Such instructions can be easilly "absorbed" by vacant slots in a
> +  // non-compact loop. Therefore, recognizing popcount idiom only makes
> sense
> +  // in a compact loop.
> +
> +  // Give up if the loop has multiple blocks or multiple backedges.
> +  if (CurLoop->getNumBackEdges() != 1 || CurLoop->getNumBlocks() != 1)
>      return false;
>
> -  // The trip count of the loop must be analyzable.
> -  SE = &getAnalysis<ScalarEvolution>();
> -  if (!SE->hasLoopInvariantBackedgeTakenCount(L))
> +  BasicBlock *LoopBody = *(CurLoop->block_begin());
> +  if (LoopBody->size() >= 20) {
>

Why is 20 instructions too big? Shouldn't this be a ratio of instructions
related to computing popcount and those not related? What if these
instructions are all debug info intrinsics?

+    // The loop is too big, bail out.
> +    return false;
> +  }
> +
> +  // It should have a preheader containing nothing but a goto instruction.
>

We don't have goto instructions. Please try to use the IR terminology in
comments in LLVM.

Also you don't explain why this is important for your transformation.

+  BasicBlock *PreHead = CurLoop->getLoopPreheader();
> +  if (!PreHead || !LIRUtil::isAlmostEmpty(PreHead))
> +    return false;
> +
> +  // It should have a precondition block where the generated popcount
> instrinsic
> +  // function will be inserted.
>

What is the distinction between a precondition and a preheader? I really
don't understand the placement logic here. The loop preheader is a BB that
is *not* part of the loop, but dominates it (plus other constraints). IT
sounds like the perfect place to insert this intrinsic. Do you mean the
loop *header* here? If so, I think that what you are calling a
'precondition' is actually what LLVM's loop info calls a preheader.... But
maybe I'm misunderstanding something.

> +  PreCondBB = LIRUtil::getPrecondBb(PreHead);
> +  if (!PreCondBB)
> +    return false;
> +
> +  return true;
> +}
> +
> +Value *NclPopcountRecognize::matchCondition (BranchInst *Br,
> +                                             BasicBlock *LoopEntry) const
> {
> +  if (!Br || !Br->isConditional())
> +    return 0;
> +
> +  ICmpInst *Cond = dyn_cast<ICmpInst>(Br->getCondition());
> +  if (!Cond)
> +    return 0;
> +
> +  ConstantInt *CmpZero = dyn_cast<ConstantInt>(Cond->getOperand(1));
> +  if (!CmpZero || !CmpZero->isZero())
> +    return 0;
> +
> +  ICmpInst::Predicate Pred = Cond->getPredicate();
> +  if ((Pred == ICmpInst::ICMP_NE && Br->getSuccessor(0) == LoopEntry) ||
> +      (Pred == ICmpInst::ICMP_EQ && Br->getSuccessor(1) == LoopEntry))
> +    return Cond->getOperand(0);
> +
> +  return 0;
> +}
> +
> +bool NclPopcountRecognize::detectIdiom(Instruction *&CntInst,
> +                                       PHINode *&CntPhi,
> +                                       Value *&Var) const {
> +  // Following code tries to detect this idiom:
> +  //
> +  //    if (x0 != 0)
> +  //      goto loop-exit // the precondition of the loop
> +  //    cnt0 = init-val;
> +  //    do {
> +  //       x1 = phi (x0, x2);
> +  //       cnt1 = phi(cnt0, cnt2);
> +  //
> +  //       cnt2 = cnt1 + 1;
> +  //        ...
> +  //       x2 = x1 & (x1 - 1);
> +  //        ...
> +  //    } while(x != 0);
> +  //
> +  // loop-exit:
> +  //
> +
> +  // step 1: Check to see if the look-back branch match this pattern:
> +  //    "if (a!=0) goto loop-entry".
> +  BasicBlock *LoopEntry;
> +  Instruction *DefX2, *CountInst;
> +  Value *VarX1, *VarX0;
> +  PHINode *PhiX, *CountPhi;
> +
> +  DefX2 = CountInst = 0;
> +  VarX1 = VarX0 = 0;
> +  PhiX = CountPhi = 0;
> +  LoopEntry = *(CurLoop->block_begin());
> +
> +  // step 1: Check if the loop-back branch is in desirable form.
>

This is called the loop back-edge in the LoopInfo terminology. This is also
the second step one comment, the first one only comenst variable
declarations??? I'm confusied.

> +  {
>

Why do you have an extraneous scope here?

> +    if (Value *T = matchCondition (LIRUtil::getBranch(LoopEntry),
> LoopEntry))
> +      DefX2 = dyn_cast<Instruction>(T);
> +    else
> +      return false;
>

Instead of this, if the condition fails, and return false. Then set the
value.

> +  }
> +
> +  // step 2: detect instructions corresponding to "x2 = x1 & (x1 - 1)"
> +  {
> +    if (DefX2->getOpcode() != Instruction::And)
> +      return false;
> +
> +    BinaryOperator *SubOneOp;
> +
> +    if ((SubOneOp = dyn_cast<BinaryOperator>(DefX2->getOperand(0))))
> +      VarX1 = DefX2->getOperand(1);
> +    else {
> +      VarX1 = DefX2->getOperand(0);
> +      SubOneOp = dyn_cast<BinaryOperator>(DefX2->getOperand(1));
> +    }
> +    if (!SubOneOp)
> +      return false;

+
> +    Instruction *SubInst = cast<Instruction>(SubOneOp);
> +    ConstantInt *Dec = dyn_cast<ConstantInt>(SubInst->getOperand(1));
> +    if (!Dec ||
> +        !((SubInst->getOpcode() == Instruction::Sub && Dec->isOne()) ||
> +          (SubInst->getOpcode() == Instruction::Add &&
> Dec->isAllOnesValue()))) {
> +      return false;
> +    }
> +  }
>

Please replace all of this code with the use of PatternMatch. See
InstCombine or that header for details on how.

> +
> +  // step 3: Check the recurrence of variable X
> +  {
> +    PhiX = dyn_cast<PHINode>(VarX1);
> +    if (!PhiX ||
> +        (PhiX->getOperand(0) != DefX2 && PhiX->getOperand(1) != DefX2)) {
> +      return false;
> +    }
> +  }
>

Similar here, although PatternMatch may not (yet) support PHI nodes. If
not, it should be added.

> +
> +  // step 4: Find the instruction which count the population: cnt2 = cnt1
> + 1
> +  {
> +    CountInst = NULL;
> +    for (BasicBlock::iterator Iter = LoopEntry->getFirstNonPHI(),
> +           IterE = LoopEntry->end(); Iter != IterE; Iter++) {
> +      Instruction *Inst = Iter;
> +      if (Inst->getOpcode() != Instruction::Add)
> +        continue;
> +
> +      ConstantInt *Inc = dyn_cast<ConstantInt>(Inst->getOperand(1));
> +      if (!Inc || !Inc->isOne())
> +        continue;
> +
> +      PHINode *Phi = dyn_cast<PHINode>(Inst->getOperand(0));
> +      if (!Phi && Phi->getParent() != LoopEntry)
>

This && should be an || shouldn't it? Why isn't there a test case that
covers this? Please add one as wel as fixing.

> +        continue;
> +
> +      // Check if the result of the instruction is live of the loop.
> +      bool LiveOutLoop = false;
> +      for (Value::use_iterator I = Inst->use_begin(), E = Inst->use_end();
> +             I != E;  I++) {
> +        if ((cast<Instruction>(*I))->getParent() != LoopEntry) {
> +          LiveOutLoop = true; break;
>

Please don't place two statements on one line. *Especially* when one is a
control flow statement!

> +        }
> +      }
>

Always hoist predicate loops like this into a helper function so that you
can use 'return true'. in the loop. The coding standards talk about this.

> +
> +      if (LiveOutLoop) {
> +        CountInst = Inst;
> +        CountPhi = Phi;
> +        break;
> +      }
> +    }
>

And this hoist this loop into a helper function so you can return the
instructions found rather than having to test and set them here.

> +
> +    if (!CountInst)
> +      return false;
> +  }
> +
> +  // step 5: check if the precondition is in this form:
> +  //   "if (x != 0) goto loop-head ; else goto somewhere-we-don't-care;"
> +  {
> +    BranchInst *PreCondBr = LIRUtil::getBranch(PreCondBB);
> +    Value *T = matchCondition (PreCondBr, CurLoop->getLoopPreheader());
>

Never have a space before an open parenthesis of a function call either.
The conding standards discuss this.

> +    if (T != PhiX->getOperand(0) && T != PhiX->getOperand(1))
> +      return false;
> +
> +    CntInst = CountInst;
> +    CntPhi = CountPhi;
>

This is the third layer of storing an inner variable in an outer variable.
These all need to abstracted into function calls.

I know that there are three variables here, but output parameters and a
bool return indicating success / failure will be immensely cleaner.

> +    Var = T;
> +  }
> +
> +  return true;
> +}
> +
> +void NclPopcountRecognize::transform(Instruction *CntInst,
> +                                     PHINode *CntPhi, Value *Var) {
> +
> +  ScalarEvolution *SE = LIR.getScalarEvolution();
> +  TargetLibraryInfo *TLI = LIR.getTargetLibraryInfo();
> +  BasicBlock *PreHead = CurLoop->getLoopPreheader();
> +  BranchInst *PreCondBr = LIRUtil::getBranch(PreCondBB);
> +  const DebugLoc DL = CntInst->getDebugLoc();
> +
> +  // Assuming before transformation, the loop is following:
> +  //  if (x) // the precondition
> +  //     do { cnt++; x &= x - 1; } while(x);
> +
> +  // Step 1: Insert the ctpop instruction at the end of the precondition
> block
> +  IRBuilderTy Builder(PreCondBr);
> +  Value *PopCnt, *PopCntZext, *NewCount;
> +  {
>

Again, please don't use {}s unless you are introducing a scoped variable of
some kind.

> +    PopCnt = createPopcntIntrinsic(Builder, Var, DL);
> +    NewCount = PopCntZext =
> +      Builder.CreateZExtOrTrunc(PopCnt,
> cast<IntegerType>(CntPhi->getType()));
> +
> +    if (NewCount != PopCnt)
> +      (cast<Instruction>(NewCount))->setDebugLoc(DL);
> +
> +    // If the popoulation counter's initial value is not zero, insert Add
> Inst.
> +    Value *CntInitVal = CntPhi->getIncomingValueForBlock(PreHead);
> +    ConstantInt *InitConst = dyn_cast<ConstantInt>(CntInitVal);
> +    if (!InitConst || !InitConst->isZero()) {
>

This is incorrect, and will crash LLVM on certain inputs: if InitConst is
null here, we pass the test and pass it to CreateAdd below that eventually
segfaults. Please fix, and please add a test case for this case. I spotted
it by inspection (after some code in the wild crashed). I can try to help
you with a testcase if you need, but it should be quite straight forward to
synthesize one.

You should try to always use one of the two patterns:

Foo *MyFooVaribale = getFoo();
if (!MyFooVariable)
  return ...;

or

if (Foo *MyFooVariable = getFoo()) {
  ...
}

The first handles early exit, the latter a predicated block. The reason
this is good is that it makes it harder to write these types of bugs.

> +      NewCount = Builder.CreateAdd(PopCnt, InitConst);
> +      (cast<Instruction>(NewCount))->setDebugLoc(DL);
>

You don't need extra ()s around the cast.

> +    }
> +  }
> +
> +  // Step 2: Replace the precondition from "if(x == 0) goto loop-exit" to
> +  //   "if(NewCount == 0) loop-exit". Withtout this change, the intrinsic
> +  //   function would be partial dead code, and downstream passes will
> drag
> +  //   it back from the precondition block to the preheader.
> +  {
> +    ICmpInst *PreCond = cast<ICmpInst>(PreCondBr->getCondition());
> +
> +    Value *Opnd0 = PopCntZext;
> +    Value *Opnd1 = ConstantInt::get(PopCntZext->getType(), 0);
> +    if (PreCond->getOperand(0) != Var)
> +      std::swap(Opnd0, Opnd1);
> +
> +    ICmpInst *NewPreCond =
> +      cast<ICmpInst>(Builder.CreateICmp(PreCond->getPredicate(), Opnd0,
> Opnd1));
> +    PreCond->replaceAllUsesWith(NewPreCond);
> +
> +    deleteDeadInstruction(PreCond, *SE, TLI);
> +  }
> +
> +  // Step 3: Note that the population count is exactly the trip count of
> the
> +  // loop in question, which enble us to to convert the loop from
> noncountable
> +  // loop into a countable one. The benefit is twofold:
> +  //
> +  //  - If the loop only counts population, the entire loop become dead
> after
> +  //    the transformation. It is lots easier to prove a countable loop
> dead
> +  //    than to prove a noncountable one. (In some C dialects, a infite
> loop
> +  //    isn't dead even if it computes nothing useful. In general, DCE
> needs
> +  //    to prove a noncountable loop finite before safely delete it.)
> +  //
> +  //  - If the loop also performs something else, it remains alive.
> +  //    Since it is transformed to countable form, it can be aggressively
> +  //    optimized by some optimizations which are in general not
> applicable
> +  //    to a noncountable loop.
> +  //
> +  // After this step, this loop (conceptually) would look like following:
> +  //   newcnt = __builtin_ctpop(x);
> +  //   t = newcnt;
> +  //   if (x)
> +  //     do { cnt++; x &= x-1; t--) } while (t > 0);
> +  BasicBlock *Body = *(CurLoop->block_begin());
> +  {
> +    BranchInst *LbBr = LIRUtil::getBranch(Body);
> +    ICmpInst *LbCond = cast<ICmpInst>(LbBr->getCondition());
> +    Type *Ty = NewCount->getType();
> +
> +    PHINode *TcPhi = PHINode::Create(Ty, 2, "tcphi", Body->begin());
> +
> +    Builder.SetInsertPoint(LbCond);
> +    Value *Opnd1 = cast<Value>(TcPhi);
> +    Value *Opnd2 = cast<Value>(ConstantInt::get(Ty, 1));
> +    Instruction *TcDec =
> +      cast<Instruction>(Builder.CreateSub(Opnd1, Opnd2, "tcdec", false,
> true));
> +
> +    TcPhi->addIncoming(NewCount, PreHead);
> +    TcPhi->addIncoming(TcDec, Body);
> +
> +    CmpInst::Predicate Pred = (LbBr->getSuccessor(0) == Body) ?
> +      CmpInst::ICMP_UGT : CmpInst::ICMP_SLE;
> +    LbCond->setPredicate(Pred);
> +    LbCond->setOperand(0, TcDec);
> +    LbCond->setOperand(1, cast<Value>(ConstantInt::get(Ty, 0)));
> +  }
> +
> +  // Step 4: All the references to the original population counter outside
> +  //  the loop are replaced with the NewCount -- the value returned from
> +  //  __builtin_ctpop().
> +  {
> +    SmallVector<Value *, 4> CntUses;
> +    for (Value::use_iterator I = CntInst->use_begin(), E =
> CntInst->use_end();
> +         I != E; I++) {
> +      if (cast<Instruction>(*I)->getParent() != Body)
> +        CntUses.push_back(*I);
> +    }
> +    for (unsigned Idx = 0; Idx < CntUses.size(); Idx++) {
> +      (cast<Instruction>(CntUses[Idx]))->replaceUsesOfWith(CntInst,
> NewCount);
>

No need for ()s around the cast.

> +    }
>

Why the two loops and the vector? Why not just replace the uses as you find
them?

Also, convention is to not use {}s for single line loops.

> +  }
> +
> +  // step 5: Forget the "non-computable" trip-count SCEV associated with
> the
> +  //   loop. The loop would otherwise not be deleted even if it becomes
> empty.
> +  SE->forgetLoop(CurLoop);
> +}
> +
> +CallInst *NclPopcountRecognize::createPopcntIntrinsic(IRBuilderTy
> &IRBuilder,
> +                                                      Value *Val,
> DebugLoc DL) {
> +  Value *Ops[] = { Val };
> +  Type *Tys[] = { Val->getType() };
> +
> +  Module *M = (*(CurLoop->block_begin()))->getParent()->getParent();
> +  Value *Func = Intrinsic::getDeclaration(M, Intrinsic::ctpop, Tys);
> +  CallInst *CI = IRBuilder.CreateCall(Func, Ops);
> +  CI->setDebugLoc(DL);
> +
> +  return CI;
> +}
> +
> +/// recognize - detect population count idiom in a non-countable loop. If
> +///   detected, transform the relevant code to popcount intrinsic function
> +///   call, and return true; otherwise, return false.
> +bool NclPopcountRecognize::recognize() {
> +
> +  if (!LIR.getScalarTargetTransformInfo())
> +    return false;
> +
> +  LIR.getScalarEvolution();
> +
> +  if (!preliminaryScreen())
> +    return false;
> +
> +  Instruction *CntInst;
> +  PHINode *CntPhi;
> +  Value *Val;
> +  if (!detectIdiom(CntInst, CntPhi, Val))
>      return false;
> -  const SCEV *BECount = SE->getBackedgeTakenCount(L);
> +
> +  transform(CntInst, CntPhi, Val);
> +  return true;
> +}
> +
>
> +//===----------------------------------------------------------------------===//
> +//
> +//          Implementation of LoopIdiomRecognize
> +//
>
> +//===----------------------------------------------------------------------===//
> +
> +bool LoopIdiomRecognize::runOnCountableLoop() {
> +  const SCEV *BECount = SE->getBackedgeTakenCount(CurLoop);
>    if (isa<SCEVCouldNotCompute>(BECount)) return false;
>
>    // If this loop executes exactly one time, then it should be peeled, not
> @@ -199,24 +663,27 @@
>        return false;
>
>    // We require target data for now.
> -  TD = getAnalysisIfAvailable<DataLayout>();
> -  if (TD == 0) return false;
> +  if (!getDataLayout())
> +    return false;
> +
> +  getDominatorTree();
>
> -  DT = &getAnalysis<DominatorTree>();
>    LoopInfo &LI = getAnalysis<LoopInfo>();
>    TLI = &getAnalysis<TargetLibraryInfo>();
>
> +  getTargetLibraryInfo();
> +
>    SmallVector<BasicBlock*, 8> ExitBlocks;
>    CurLoop->getUniqueExitBlocks(ExitBlocks);
>
>    DEBUG(dbgs() << "loop-idiom Scanning: F["
> -               << L->getHeader()->getParent()->getName()
> -               << "] Loop %" << L->getHeader()->getName() << "\n");
> +               << CurLoop->getHeader()->getParent()->getName()
> +               << "] Loop %" << CurLoop->getHeader()->getName() << "\n");
>
>    bool MadeChange = false;
>    // Scan all the blocks in the loop that are not in subloops.
> -  for (Loop::block_iterator BI = L->block_begin(), E = L->block_end(); BI
> != E;
> -       ++BI) {
> +  for (Loop::block_iterator BI = CurLoop->block_begin(),
> +         E = CurLoop->block_end(); BI != E; ++BI) {
>      // Ignore blocks in subloops.
>      if (LI.getLoopFor(*BI) != CurLoop)
>        continue;
> @@ -226,6 +693,33 @@
>    return MadeChange;
>  }
>
> +bool LoopIdiomRecognize::runOnNoncountableLoop() {
> +  NclPopcountRecognize Popcount(*this);
> +  if (Popcount.recognize())
> +    return true;
> +
> +  return false;
> +}
> +
> +bool LoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {
> +  CurLoop = L;
> +
> +  // If the loop could not be converted to canonical form, it must have an
> +  // indirectbr in it, just give up.
> +  if (!L->getLoopPreheader())
> +    return false;
> +
> +  // Disable loop idiom recognition if the function's name is a common
> idiom.
> +  StringRef Name = L->getHeader()->getParent()->getName();
> +  if (Name == "memset" || Name == "memcpy")
> +    return false;
> +
> +  SE = &getAnalysis<ScalarEvolution>();
> +  if (SE->hasLoopInvariantBackedgeTakenCount(L))
> +    return runOnCountableLoop();
> +  return runOnNoncountableLoop();
> +}
> +
>  /// runOnLoopBlock - Process the specified block, which lives in a
> counted loop
>  /// with the specified backedge count.  This block is known to be in the
> current
>  /// loop and not in any subloops.
>
> Added: llvm/trunk/test/Transforms/LoopIdiom/popcnt.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/popcnt.ll?rev=168931&view=auto
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/LoopIdiom/popcnt.ll (added)
> +++ llvm/trunk/test/Transforms/LoopIdiom/popcnt.ll Thu Nov 29 13:38:54 2012
> @@ -0,0 +1,76 @@
> +; RUN: opt -loop-idiom < %s -mtriple=x86_64-apple-darwin -mcpu=corei7 -S
> | FileCheck %s
> +
> +;To recognize this pattern:
> +;int popcount(unsigned long long a) {
> +;    int c = 0;
> +;    while (a) {
> +;        c++;
> +;        a &= a - 1;
> +;    }
> +;    return c;
> +;}
> +;
> +; CHECK: entry
> +; CHECK: llvm.ctpop.i64
> +; CHECK: ret
> +define i32 @popcount(i64 %a) nounwind uwtable readnone ssp {
> +entry:
> +  %tobool3 = icmp eq i64 %a, 0
> +  br i1 %tobool3, label %while.end, label %while.body
> +
> +while.body:                                       ; preds = %entry,
> %while.body
> +  %c.05 = phi i32 [ %inc, %while.body ], [ 0, %entry ]
> +  %a.addr.04 = phi i64 [ %and, %while.body ], [ %a, %entry ]
> +  %inc = add nsw i32 %c.05, 1
> +  %sub = add i64 %a.addr.04, -1
> +  %and = and i64 %sub, %a.addr.04
> +  %tobool = icmp eq i64 %and, 0
> +  br i1 %tobool, label %while.end, label %while.body
> +
> +while.end:                                        ; preds = %while.body,
> %entry
> +  %c.0.lcssa = phi i32 [ 0, %entry ], [ %inc, %while.body ]
> +  ret i32 %c.0.lcssa
> +}
> +
> +; To recognize this pattern:
> +;int popcount(unsigned long long a, int mydata1, int mydata2) {
> +;    int c = 0;
> +;    while (a) {
> +;        c++;
> +;        a &= a - 1;
> +;        mydata1 *= c;
> +;        mydata2 *= (int)a;
> +;    }
> +;    return c + mydata1 + mydata2;
> +;}
> +; CHECK: entry
> +; CHECK: llvm.ctpop.i64
> +; CHECK: ret
> +define i32 @popcount2(i64 %a, i32 %mydata1, i32 %mydata2) nounwind
> uwtable readnone ssp {
> +entry:
> +  %tobool9 = icmp eq i64 %a, 0
> +  br i1 %tobool9, label %while.end, label %while.body
> +
> +while.body:                                       ; preds = %entry,
> %while.body
> +  %c.013 = phi i32 [ %inc, %while.body ], [ 0, %entry ]
> +  %mydata2.addr.012 = phi i32 [ %mul1, %while.body ], [ %mydata2, %entry ]
> +  %mydata1.addr.011 = phi i32 [ %mul, %while.body ], [ %mydata1, %entry ]
> +  %a.addr.010 = phi i64 [ %and, %while.body ], [ %a, %entry ]
> +  %inc = add nsw i32 %c.013, 1
> +  %sub = add i64 %a.addr.010, -1
> +  %and = and i64 %sub, %a.addr.010
> +  %mul = mul nsw i32 %inc, %mydata1.addr.011
> +  %conv = trunc i64 %and to i32
> +  %mul1 = mul nsw i32 %conv, %mydata2.addr.012
> +  %tobool = icmp eq i64 %and, 0
> +  br i1 %tobool, label %while.end, label %while.body
> +
> +while.end:                                        ; preds = %while.body,
> %entry
> +  %c.0.lcssa = phi i32 [ 0, %entry ], [ %inc, %while.body ]
> +  %mydata2.addr.0.lcssa = phi i32 [ %mydata2, %entry ], [ %mul1,
> %while.body ]
> +  %mydata1.addr.0.lcssa = phi i32 [ %mydata1, %entry ], [ %mul,
> %while.body ]
> +  %add = add i32 %mydata2.addr.0.lcssa, %mydata1.addr.0.lcssa
> +  %add2 = add i32 %add, %c.0.lcssa
> +  ret i32 %add2
> +}
> +
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121207/2aa783cc/attachment.html>