[PATCH] [x86] @llvm.ctpop.v8i32 custom lowering

Thu Dec 4 11:50:42 PST 2014

Chandler,

Thanks for the help, the assembly for v8i32-new/old:
http://pastebin.com/4gnd41Je

About the principled split: I rather go the other way around, i.e., since SelectionDAGLegalize::ExpandBitCount already emits the bit-math for scalarized versions it makes more sense to custom split to other known vector types only when we already know it's profitable.

Nadav and Hal,

There are potential benefits for other targets I believe, but this customisation generates a bunch of vector instructions and I'm afraid that if one or other vector instruction isn't well supported on a target, that could lead to a lot of scalarized instructions which may lead to worse code than before? I might be wrong though. I just rather go into the direction that if other targets implement it and succeed, we than move it to target independent code. Additional thoughts?

Actually, back to x86, if popcnt isn't supported by some x86 target it currently leads to this bitmath scalarized code for each element and it would be always profitable to emit the vectorized code instead - tested it for v4i32, v2i64, v4i64 and v8i32 and it performs better. Gonna update the patch to reflect that. For instance "-arch x86_64" doesn't assume popcnt by default, since it is a separate feature, in cases like this we would always win.

http://reviews.llvm.org/D6531