[llvm-commits] [PATCH] Population-count loop idiom recognization

Nadav Rotem nrotem at apple.com
Mon Nov 19 12:36:25 PST 2012


Hi Shuxin!

Overall I like it, but  I have a few comments:

+
+  /// CreateCtpop - Create and insert ctpop.
+  CallInst *CreateCtpop(Value *Val);
+

Why did you need to add this function to the IRBuilder ? I understand that the LoopIdiom pass uses the builder API for creating the memset calls, but I think that memset is much more commonly used. I think that you should create the call to popcnt using a regular call. 

Regarding the name, it looks like in some places we call it 'CtPop' and in other places we call it 'popcnt'.  I think that we need to fix this at some point. It does not need to be a part of this patch. 

+ScalarTargetTransformInfo::PopcntHwSupport
+X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned TyWidth) const {
+  assert(!(TyWidth & (TyWidth - 1)) && "Ty width must be power of 2");

You can use the "isPowerOfTwo_64"  call. 


+    // Return true iff the block contains nothing but goto instruction.
+    static bool isAlmostEmpty(BasicBlock *);

Please doxygen the comment. (three slashes ///).  Also, please document the other functions in the class.

+    static BranchInst *getBranch(BasicBlock * BB) {
+      return dyn_cast<BranchInst>(BB->getTerminator());
+    }
+

Maybe you could add assertions here. 

+  // NclPopcountRecognize - This class is to recoginize idioms of population- 
+  //  count conducted in a noncountable loop. Currently it only recognizes
+  //  this pattern: "while(x) {cnt++; ...; x &= x - 1; ...}"
+  //  
+  class NclPopcountRecognize {

Also, doxygen this comment, and document the members.  


   class LoopIdiomRecognize : public LoopPass {
+    friend class NclPopcountRecognize;
     Loop *CurLoop;
+    mutable const DataLayout *TD;
+    mutable DominatorTree *DT;
+    mutable ScalarEvolution *SE;
+    mutable TargetLibraryInfo *TLI;
+    mutable const ScalarTargetTransformInfo *STTI;

Why did you have to make these mutable ?  Why is "NclPopcountRecognize" a friend ? Is there another way to solve this problem ?

+
+  private:
+    const DataLayout *getDataLayout() const {
+      return TD ? TD : TD=getAnalysisIfAvailable<DataLayout>(); 
+    }
+
+    DominatorTree *getDominatorTree() const {
+      return DT ? DT : (DT=&getAnalysis<DominatorTree>());
+    }
+

Why do you do the lazy evaluation and not just initialize everything upon initialization ?

+// return true iff the block contains nothing but goto.
+bool LIRUtil::isAlmostEmpty(BasicBlock *BB) {
+  if (BranchInst *Br = getBranch(BB)) {
+    return Br->isUnconditional() && BB->size() == 1;
+  }
+  return false;
+}

The documentation needs to be in the class. Also, I think that you are checking for a completely empty loop. No ?
Do you also need to check other things here, such as that this is the header of the loop ?

+/// NclPopcountRecognize - Take a glimpse of the loop to see if we need to go
+/// ahead recoginizing the idiom.
+bool NclPopcountRecognize::preliminaryScreen() {
+
+  if (LIR.STTI->getPopcntHwSupport(32) != ScalarTargetTransformInfo::Fast)
     return false;

Why extra space ? Why are you only checking for 32bits ?

+  // It should have a preheader containing nothing but a goto instruction.
+  BasicBlock *PreHead = CurLoop->getLoopPreheader();

We don't have goto instructions, just branches :)

+  // It should have a precondition block where the generated popcount instrinsic
+  // function will be inserted
.
Predondition -> preheader .


Thanks,
Nadav

On Nov 19, 2012, at 12:09 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> Hi, dear all:
> 
>  The attached patch is to recognize this population-count pattern:
> 
> --------------------------------------------------------------------
> int popcount(unsigned long long a) {
>    int c = 0;
>    while (a) {
>        c++;
>        ...  // both a & c would be used multiple times in or out of loop
>        a &= a - 1;
>        ...
>    }
> 
>    return c;
> }
> ---------------------------------------------------------------------
> 
> The changes are highlight bellow:
> =================================
>  1.Loop-idiom-Recognize pass identifies the patten and convert the releveant code into
>    built-in function __builtin_popcount().
>  2.CodeGen will expand the __builtin_popcount() into single instruction or a instruction sequence.
>  3.This optimization is enabled only if underlying architecture has "fast" popcount hw support.
>    (ScalarTargetTransformInfo::getPopcntHwSupport() returns "Fast")
> 
> Hw support for Popcount
> =======================
> 
>  a) X86-64
>     SSE4.1 provide POPCNT instruction for this purpose; Population-count can be *MORE* efficiently
>     calcuated by SSE3* instruction sequences. However, it seems to be that CodeGen doesn't
>     generate efficient instruction sequence when SSE3* is avaiable.
> 
>    (http://www.strchr.com/crc32_popcnt).
> 
>  b) ARM
>     Evan told me that pop-count can be calculated very efficiently on ARM with some vect instruction.
>     It seems the CodeGen doesn't take advantage of vect instr.
> 
>  The improvement to CodeGen is on the TODO list.
> 
> 
> Performance Impact:
> ==================
>   On my MacBookPro with corei7, this change see about 2% odd improvement on crafty at CPU2kint fed with
>   Ref input (26.6s vs 26.0s, measured by at least N times).
> 
>   The crafty is compiled with "-msse4" to enable the opt. As I mentioned above, the pop-count loop
> will be replaced into a single POPCNT instruction. Maybe it is more desirable to be replaced with a
> SSE3* instruction sequences as it is faster than POPCNT instruction.
> 
> Tons of thanks in advance!
> 
> Best Regards
> Shuxin
> <popcount.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list