[llvm-commits] [PATCH] Population-count loop idiom recognization
Nadav Rotem
nrotem at apple.com
Mon Nov 19 12:36:25 PST 2012
Hi Shuxin!
Overall I like it, but I have a few comments:
+
+ /// CreateCtpop - Create and insert ctpop.
+ CallInst *CreateCtpop(Value *Val);
+
Why did you need to add this function to the IRBuilder ? I understand that the LoopIdiom pass uses the builder API for creating the memset calls, but I think that memset is much more commonly used. I think that you should create the call to popcnt using a regular call.
Regarding the name, it looks like in some places we call it 'CtPop' and in other places we call it 'popcnt'. I think that we need to fix this at some point. It does not need to be a part of this patch.
+ScalarTargetTransformInfo::PopcntHwSupport
+X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned TyWidth) const {
+ assert(!(TyWidth & (TyWidth - 1)) && "Ty width must be power of 2");
You can use the "isPowerOfTwo_64" call.
+ // Return true iff the block contains nothing but goto instruction.
+ static bool isAlmostEmpty(BasicBlock *);
Please doxygen the comment. (three slashes ///). Also, please document the other functions in the class.
+ static BranchInst *getBranch(BasicBlock * BB) {
+ return dyn_cast<BranchInst>(BB->getTerminator());
+ }
+
Maybe you could add assertions here.
+ // NclPopcountRecognize - This class is to recoginize idioms of population-
+ // count conducted in a noncountable loop. Currently it only recognizes
+ // this pattern: "while(x) {cnt++; ...; x &= x - 1; ...}"
+ //
+ class NclPopcountRecognize {
Also, doxygen this comment, and document the members.
class LoopIdiomRecognize : public LoopPass {
+ friend class NclPopcountRecognize;
Loop *CurLoop;
+ mutable const DataLayout *TD;
+ mutable DominatorTree *DT;
+ mutable ScalarEvolution *SE;
+ mutable TargetLibraryInfo *TLI;
+ mutable const ScalarTargetTransformInfo *STTI;
Why did you have to make these mutable ? Why is "NclPopcountRecognize" a friend ? Is there another way to solve this problem ?
+
+ private:
+ const DataLayout *getDataLayout() const {
+ return TD ? TD : TD=getAnalysisIfAvailable<DataLayout>();
+ }
+
+ DominatorTree *getDominatorTree() const {
+ return DT ? DT : (DT=&getAnalysis<DominatorTree>());
+ }
+
Why do you do the lazy evaluation and not just initialize everything upon initialization ?
+// return true iff the block contains nothing but goto.
+bool LIRUtil::isAlmostEmpty(BasicBlock *BB) {
+ if (BranchInst *Br = getBranch(BB)) {
+ return Br->isUnconditional() && BB->size() == 1;
+ }
+ return false;
+}
The documentation needs to be in the class. Also, I think that you are checking for a completely empty loop. No ?
Do you also need to check other things here, such as that this is the header of the loop ?
+/// NclPopcountRecognize - Take a glimpse of the loop to see if we need to go
+/// ahead recoginizing the idiom.
+bool NclPopcountRecognize::preliminaryScreen() {
+
+ if (LIR.STTI->getPopcntHwSupport(32) != ScalarTargetTransformInfo::Fast)
return false;
Why extra space ? Why are you only checking for 32bits ?
+ // It should have a preheader containing nothing but a goto instruction.
+ BasicBlock *PreHead = CurLoop->getLoopPreheader();
We don't have goto instructions, just branches :)
+ // It should have a precondition block where the generated popcount instrinsic
+ // function will be inserted
.
Predondition -> preheader .
Thanks,
Nadav
On Nov 19, 2012, at 12:09 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Hi, dear all:
>
> The attached patch is to recognize this population-count pattern:
>
> --------------------------------------------------------------------
> int popcount(unsigned long long a) {
> int c = 0;
> while (a) {
> c++;
> ... // both a & c would be used multiple times in or out of loop
> a &= a - 1;
> ...
> }
>
> return c;
> }
> ---------------------------------------------------------------------
>
> The changes are highlight bellow:
> =================================
> 1.Loop-idiom-Recognize pass identifies the patten and convert the releveant code into
> built-in function __builtin_popcount().
> 2.CodeGen will expand the __builtin_popcount() into single instruction or a instruction sequence.
> 3.This optimization is enabled only if underlying architecture has "fast" popcount hw support.
> (ScalarTargetTransformInfo::getPopcntHwSupport() returns "Fast")
>
> Hw support for Popcount
> =======================
>
> a) X86-64
> SSE4.1 provide POPCNT instruction for this purpose; Population-count can be *MORE* efficiently
> calcuated by SSE3* instruction sequences. However, it seems to be that CodeGen doesn't
> generate efficient instruction sequence when SSE3* is avaiable.
>
> (http://www.strchr.com/crc32_popcnt).
>
> b) ARM
> Evan told me that pop-count can be calculated very efficiently on ARM with some vect instruction.
> It seems the CodeGen doesn't take advantage of vect instr.
>
> The improvement to CodeGen is on the TODO list.
>
>
> Performance Impact:
> ==================
> On my MacBookPro with corei7, this change see about 2% odd improvement on crafty at CPU2kint fed with
> Ref input (26.6s vs 26.0s, measured by at least N times).
>
> The crafty is compiled with "-msse4" to enable the opt. As I mentioned above, the pop-count loop
> will be replaced into a single POPCNT instruction. Maybe it is more desirable to be replaced with a
> SSE3* instruction sequences as it is faster than POPCNT instruction.
>
> Tons of thanks in advance!
>
> Best Regards
> Shuxin
> <popcount.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list