[cfe-dev] __builtin_parity assembly without popcnt
Friedman, Eli via cfe-dev
cfe-dev at lists.llvm.org
Fri Jul 20 11:36:10 PDT 2018
On 7/20/2018 11:02 AM, via cfe-dev wrote:
>
> Hello all!
>
> I’ve been mucking around in an old codebase at work looking for easy
> performance wins. One avenue involves replacing a switch-based
> variable assignment with something derived from the parity of an input
> variable. I was pretty surprised when I saw the generated assembly,
> and I’m wondering about the reasoning behind it.
>
> In short, it boils down to the assembly __builtin_parity() produces.
> Clang 6.0.1 (and trunk on Godbolt) produces:
>
> parity(int): # @parity(int)
>
> mov eax, edi
>
> shr eax
>
> and eax, 1431655765
>
> sub edi, eax
>
> mov eax, edi
>
> and eax, 858993459
>
> shr edi, 2
>
> and edi, 858993459
>
> add edi, eax
>
> mov eax, edi
>
> shr eax, 4
>
> add eax, edi
>
> and eax, 17764111
>
> imul eax, eax, 16843009
>
> shr eax, 24
>
> and eax, 1
>
> ret
>
> While GCC 8.1.0 (and trunk on Godbolt) produces
>
> parity(int):
>
> mov eax, edi
>
> shr edi, 16
>
> xor eax, edi
>
> xor al, ah
>
> setnp al
>
> movzx eax, al
>
> ret
>
> I know a popcnt followed by an and would be better, but unfortunately
> some of my users don’t have computers that support the popcnt
> instruction, so I can’t use a newer -march flag.
>
> Could someone explain why the difference between Clang and GCC here,
> and whether it should make a difference? The code in question is in a
> hot loop in my code, so I’d imagine the size difference could impact
> unrolling (and result in icache differences too), but I haven’t
> finished poking around with benchmarks.
>
LLVM doesn't have any special support for computing parity, so it's just
getting lowered to "popcount(x)&1"; if your target doesn't have a
popcount instruction, it uses the generic expansion (something like
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
). This is obviously not the fastest lowering, but computing parity is
not a common operation, so nobody has spent any time optimizing it.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180720/1f0f0ffa/attachment.html>
More information about the cfe-dev
mailing list