[cfe-dev] __builtin_parity assembly without popcnt

Craig Topper via cfe-dev cfe-dev at lists.llvm.org
Wed Aug 1 14:37:08 PDT 2018


I have a patch to improve builtin_parity in the x86 backend. I'll post it
shortly.
~Craig


On Mon, Jul 23, 2018 at 10:08 AM Craig Topper <craig.topper at gmail.com>
wrote:

> So it does. Never realized that
>
> On Mon, Jul 23, 2018 at 9:57 AM Benjamin Kramer <benny.kra at gmail.com>
> wrote:
>
>> I thought the parity flag only affects the last byte, meaning GCC's
>> lowering is optimal.
>>
>> On Mon, Jul 23, 2018 at 6:31 PM Craig Topper via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Why does gcc xor down to a byte? Can't you get the parity flag for the
>>> whole value by just oring with itself and grabbing the flag?
>>>
>>> ~Craig
>>>
>>>
>>> On Fri, Jul 20, 2018 at 11:37 AM Bruce Hoult via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>> This looks (not closely checked) like it's implementing the first part of
>>>> pop1() combined with the last part of pop3(). Which is suggested in the
>>>> comment just after pop1().
>>>>
>>>> http://www.hackersdelight.org/hdcodetxt/pop.c.txt
>>>>
>>>> And then anding with 1, obviously.
>>>>
>>>> I wonder if division is good enough on modern machines to make pop2()
>>>> faster.
>>>>
>>>> XORing down to a byte and then using the x86 built in parity flag is
>>>> obviously better if you are on an x86, of course. Other machines don't
>>>> usually have that.
>>>>
>>>> On Fri, Jul 20, 2018 at 11:02 AM, via cfe-dev <cfe-dev at lists.llvm.org>
>>>> wrote:
>>>>
>>>>> Hello all!
>>>>>
>>>>>
>>>>>
>>>>> I’ve been mucking around in an old codebase at work looking for easy
>>>>> performance wins. One avenue involves replacing a switch-based variable
>>>>> assignment with something derived from the parity of an input variable. I
>>>>> was pretty surprised when I saw the generated assembly, and I’m wondering
>>>>> about the reasoning behind it.
>>>>>
>>>>>
>>>>>
>>>>> In short, it boils down to the assembly __builtin_parity() produces.
>>>>> Clang 6.0.1 (and trunk on Godbolt) produces:
>>>>>
>>>>>
>>>>>
>>>>> parity(int):                             # @parity(int)
>>>>>
>>>>>         mov     eax, edi
>>>>>
>>>>>         shr     eax
>>>>>
>>>>>         and     eax, 1431655765
>>>>>
>>>>>         sub     edi, eax
>>>>>
>>>>>         mov     eax, edi
>>>>>
>>>>>         and     eax, 858993459
>>>>>
>>>>>         shr     edi, 2
>>>>>
>>>>>         and     edi, 858993459
>>>>>
>>>>>         add     edi, eax
>>>>>
>>>>>         mov     eax, edi
>>>>>
>>>>>         shr     eax, 4
>>>>>
>>>>>         add     eax, edi
>>>>>
>>>>>         and     eax, 17764111
>>>>>
>>>>>         imul    eax, eax, 16843009
>>>>>
>>>>>         shr     eax, 24
>>>>>
>>>>>         and     eax, 1
>>>>>
>>>>>         ret
>>>>>
>>>>>
>>>>>
>>>>> While GCC 8.1.0 (and trunk on Godbolt) produces
>>>>>
>>>>>
>>>>>
>>>>> parity(int):
>>>>>
>>>>>         mov     eax, edi
>>>>>
>>>>>         shr     edi, 16
>>>>>
>>>>>         xor     eax, edi
>>>>>
>>>>>         xor     al, ah
>>>>>
>>>>>         setnp   al
>>>>>
>>>>>         movzx   eax, al
>>>>>
>>>>>         ret
>>>>>
>>>>>
>>>>>
>>>>> I know a popcnt followed by an and would be better, but unfortunately
>>>>> some of my users don’t have computers that support the popcnt instruction,
>>>>> so I can’t use a newer -march flag.
>>>>>
>>>>>
>>>>>
>>>>> Could someone explain why the difference between Clang and GCC here,
>>>>> and whether it should make a difference? The code in question is in a hot
>>>>> loop in my code, so I’d imagine the size difference could impact unrolling
>>>>> (and result in icache differences too), but I haven’t finished poking
>>>>> around with benchmarks.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> --
> ~Craig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180801/80441716/attachment.html>


More information about the cfe-dev mailing list