[llvm-commits] [PATCH] Population-count loop idiom recognization

Mon Nov 19 12:44:58 PST 2012

----- Original Message -----
> From: "Shuxin Yang" <shuxin.llvm at gmail.com>
> To: "Commit Messages and Patches for LLVM" <llvm-commits at cs.uiuc.edu>
> Sent: Monday, November 19, 2012 12:09:39 PM
> Subject: [llvm-commits] [PATCH] Population-count loop idiom recognization
> 
> Hi, dear all:
> 
>    The attached patch is to recognize this population-count pattern:
> 
> --------------------------------------------------------------------
> int popcount(unsigned long long a) {
>      int c = 0;
>      while (a) {
>          c++;
>          ...  // both a & c would be used multiple times in or out of
>          loop
>          a &= a - 1;
>          ...
>      }
> 
>      return c;
> }
> ---------------------------------------------------------------------
> 
> The changes are highlight bellow:
> =================================
>    1.Loop-idiom-Recognize pass identifies the patten and convert the
> releveant code into
>      built-in function __builtin_popcount().
>    2.CodeGen will expand the __builtin_popcount() into single
> instruction or a instruction sequence.
>    3.This optimization is enabled only if underlying architecture has
> "fast" popcount hw support.
>      (ScalarTargetTransformInfo::getPopcntHwSupport() returns "Fast")
> 
> Hw support for Popcount
> =======================
> 
>    a) X86-64
>       SSE4.1 provide POPCNT instruction for this purpose;
> Population-count can be *MORE* efficiently
>       calcuated by SSE3* instruction sequences. However, it seems to
>       be
> that CodeGen doesn't
>       generate efficient instruction sequence when SSE3* is avaiable.
> 
>      (http://www.strchr.com/crc32_popcnt).
> 
>    b) ARM
>       Evan told me that pop-count can be calculated very efficiently
>       on
> ARM with some vect instruction.
>       It seems the CodeGen doesn't take advantage of vect instr.
> 
>    The improvement to CodeGen is on the TODO list.

Likewise, PowerPC has popcnt{d,b,w}; I can add CodeGen support for these as well.

 -Hal

> 
> 
> Performance Impact:
> ==================
>     On my MacBookPro with corei7, this change see about 2% odd
> improvement on crafty at CPU2kint fed with
>     Ref input (26.6s vs 26.0s, measured by at least N times).
> 
>     The crafty is compiled with "-msse4" to enable the opt. As I
> mentioned above, the pop-count loop
> will be replaced into a single POPCNT instruction. Maybe it is more
> desirable to be replaced with a
> SSE3* instruction sequences as it is faster than POPCNT instruction.
> 
> Tons of thanks in advance!
> 
> Best Regards
> Shuxin
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory