[PATCH] [x86] @llvm.ctpop.v8i32 custom lowering

Chandler Carruth chandlerc at gmail.com
Thu Dec 4 08:40:13 PST 2014


On Thu, Dec 4, 2014 at 8:23 AM, Bruno Cardoso Lopes <bruno.cardoso at gmail.com
> wrote:

> Hi nadav, chandlerc, andreadb, delena,
>
> This patch adds x86 custom lowering for the @llvm.ctpop.v8i32 intrinsic.
>
> Currently, the expansion of @llvm.ctpop.v8i32 uses vector element
> extractions,
> insertions and individual calls to @llvm.ctpop.i32. Local haswell
> measurements
> show that @llvm.ctpop.v8i32 gets faster by using vector parallel bit
> twiddling approaches
> than using @llvm.ctpop.i32 for each element, based on:
>
> v = v - ((v >> 1) & 0x55555555);
> v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
> v = ((v + (v >> 4) & 0xF0F0F0F)
> v = v + (v >> 8)
> v = v + (v >> 16)
> v = v & 0x0000003F
> (from
> http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
>
> Some toy microbenchmark presented a ~2x speedup, whereas vector types with
> smaller number of elements
> are still better with the old approach (see results below). Hence this
> patch only implements it for v8i32 type. The results indicate it might
> also be profitable
> to implement this approach for v32i8 and v16i16, but I haven't measured
> that yet.
>
> AVX1 ctpop.v8i32 is broken into two ctpop.v4i32, which is only slightly
> better than old expansion. However,
> this patch does not implement custom lowering for the general ctpop.v4i32
> type, since it's not profitable.
>

These timings are pretty strange.

Can you post the code produced for the old lowering and the new lowering?
I'm wondering if there is something about the old lowering that makes it
unreasonably slow.

It would be really nice to have a more principled split here such as using
the bit-math version when a scalarized form would require extracting from
multiple 128-bit lanes, or when there are more than N vector elements.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141204/cf1e1fd8/attachment.html>


More information about the llvm-commits mailing list