[llvm-commits] [PATCH] Population-count loop idiom recognization

Shuxin Yang shuxin.llvm at gmail.com
Mon Nov 19 10:09:39 PST 2012


Hi, dear all:

   The attached patch is to recognize this population-count pattern:

--------------------------------------------------------------------
int popcount(unsigned long long a) {
     int c = 0;
     while (a) {
         c++;
         ...  // both a & c would be used multiple times in or out of loop
         a &= a - 1;
         ...
     }

     return c;
}
---------------------------------------------------------------------

The changes are highlight bellow:
=================================
   1.Loop-idiom-Recognize pass identifies the patten and convert the 
releveant code into
     built-in function __builtin_popcount().
   2.CodeGen will expand the __builtin_popcount() into single 
instruction or a instruction sequence.
   3.This optimization is enabled only if underlying architecture has 
"fast" popcount hw support.
     (ScalarTargetTransformInfo::getPopcntHwSupport() returns "Fast")

Hw support for Popcount
=======================

   a) X86-64
      SSE4.1 provide POPCNT instruction for this purpose; 
Population-count can be *MORE* efficiently
      calcuated by SSE3* instruction sequences. However, it seems to be 
that CodeGen doesn't
      generate efficient instruction sequence when SSE3* is avaiable.

     (http://www.strchr.com/crc32_popcnt).

   b) ARM
      Evan told me that pop-count can be calculated very efficiently on 
ARM with some vect instruction.
      It seems the CodeGen doesn't take advantage of vect instr.

   The improvement to CodeGen is on the TODO list.


Performance Impact:
==================
    On my MacBookPro with corei7, this change see about 2% odd 
improvement on crafty at CPU2kint fed with
    Ref input (26.6s vs 26.0s, measured by at least N times).

    The crafty is compiled with "-msse4" to enable the opt. As I 
mentioned above, the pop-count loop
will be replaced into a single POPCNT instruction. Maybe it is more 
desirable to be replaced with a
SSE3* instruction sequences as it is faster than POPCNT instruction.

Tons of thanks in advance!

Best Regards
Shuxin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: popcount.patch
Type: text/x-patch
Size: 28078 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121119/742ba74e/attachment.bin>


More information about the llvm-commits mailing list