[cfe-dev] __builtin_parity assembly without popcnt

Mon Jul 23 10:08:58 PDT 2018

So it does. Never realized that

On Mon, Jul 23, 2018 at 9:57 AM Benjamin Kramer <benny.kra at gmail.com> wrote:

> I thought the parity flag only affects the last byte, meaning GCC's
> lowering is optimal.
>
> On Mon, Jul 23, 2018 at 6:31 PM Craig Topper via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Why does gcc xor down to a byte? Can't you get the parity flag for the
>> whole value by just oring with itself and grabbing the flag?
>>
>> ~Craig
>>
>>
>> On Fri, Jul 20, 2018 at 11:37 AM Bruce Hoult via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
> This looks (not closely checked) like it's implementing the first part of
>>> pop1() combined with the last part of pop3(). Which is suggested in the
>>> comment just after pop1().
>>>
>>> http://www.hackersdelight.org/hdcodetxt/pop.c.txt
>>>
>>> And then anding with 1, obviously.
>>>
>>> I wonder if division is good enough on modern machines to make pop2()
>>> faster.
>>>
>>> XORing down to a byte and then using the x86 built in parity flag is
>>> obviously better if you are on an x86, of course. Other machines don't
>>> usually have that.
>>>
>>> On Fri, Jul 20, 2018 at 11:02 AM, via cfe-dev <cfe-dev at lists.llvm.org>
>>> wrote:
>>>
>>>> Hello all!
>>>>
>>>>
>>>>
>>>> I’ve been mucking around in an old codebase at work looking for easy
>>>> performance wins. One avenue involves replacing a switch-based variable
>>>> assignment with something derived from the parity of an input variable. I
>>>> was pretty surprised when I saw the generated assembly, and I’m wondering
>>>> about the reasoning behind it.
>>>>
>>>>
>>>>
>>>> In short, it boils down to the assembly __builtin_parity() produces.
>>>> Clang 6.0.1 (and trunk on Godbolt) produces:
>>>>
>>>>
>>>>
>>>> parity(int):                             # @parity(int)
>>>>
>>>>         mov     eax, edi
>>>>
>>>>         shr     eax
>>>>
>>>>         and     eax, 1431655765
>>>>
>>>>         sub     edi, eax
>>>>
>>>>         mov     eax, edi
>>>>
>>>>         and     eax, 858993459
>>>>
>>>>         shr     edi, 2
>>>>
>>>>         and     edi, 858993459
>>>>
>>>>         add     edi, eax
>>>>
>>>>         mov     eax, edi
>>>>
>>>>         shr     eax, 4
>>>>
>>>>         add     eax, edi
>>>>
>>>>         and     eax, 17764111
>>>>
>>>>         imul    eax, eax, 16843009
>>>>
>>>>         shr     eax, 24
>>>>
>>>>         and     eax, 1
>>>>
>>>>         ret
>>>>
>>>>
>>>>
>>>> While GCC 8.1.0 (and trunk on Godbolt) produces
>>>>
>>>>
>>>>
>>>> parity(int):
>>>>
>>>>         mov     eax, edi
>>>>
>>>>         shr     edi, 16
>>>>
>>>>         xor     eax, edi
>>>>
>>>>         xor     al, ah
>>>>
>>>>         setnp   al
>>>>
>>>>         movzx   eax, al
>>>>
>>>>         ret
>>>>
>>>>
>>>>
>>>> I know a popcnt followed by an and would be better, but unfortunately
>>>> some of my users don’t have computers that support the popcnt instruction,
>>>> so I can’t use a newer -march flag.
>>>>
>>>>
>>>>
>>>> Could someone explain why the difference between Clang and GCC here,
>>>> and whether it should make a difference? The code in question is in a hot
>>>> loop in my code, so I’d imagine the size difference could impact unrolling
>>>> (and result in icache differences too), but I haven’t finished poking
>>>> around with benchmarks.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Alex
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> --
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180723/88c1ef77/attachment.html>