[cfe-dev] __builtin_parity assembly without popcnt

Craig Topper via cfe-dev cfe-dev at lists.llvm.org
Wed Aug 1 17:59:42 PDT 2018


Patch up for review here https://reviews.llvm.org/D50165

~Craig


On Wed, Aug 1, 2018 at 2:37 PM Craig Topper <craig.topper at gmail.com> wrote:

> I have a patch to improve builtin_parity in the x86 backend. I'll post it
> shortly.
> ~Craig
>
>
> On Mon, Jul 23, 2018 at 10:08 AM Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> So it does. Never realized that
>>
>> On Mon, Jul 23, 2018 at 9:57 AM Benjamin Kramer <benny.kra at gmail.com>
>> wrote:
>>
>>> I thought the parity flag only affects the last byte, meaning GCC's
>>> lowering is optimal.
>>>
>>> On Mon, Jul 23, 2018 at 6:31 PM Craig Topper via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Why does gcc xor down to a byte? Can't you get the parity flag for the
>>>> whole value by just oring with itself and grabbing the flag?
>>>>
>>>> ~Craig
>>>>
>>>>
>>>> On Fri, Jul 20, 2018 at 11:37 AM Bruce Hoult via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>> This looks (not closely checked) like it's implementing the first part
>>>>> of pop1() combined with the last part of pop3(). Which is suggested in the
>>>>> comment just after pop1().
>>>>>
>>>>> http://www.hackersdelight.org/hdcodetxt/pop.c.txt
>>>>>
>>>>> And then anding with 1, obviously.
>>>>>
>>>>> I wonder if division is good enough on modern machines to make pop2()
>>>>> faster.
>>>>>
>>>>> XORing down to a byte and then using the x86 built in parity flag is
>>>>> obviously better if you are on an x86, of course. Other machines don't
>>>>> usually have that.
>>>>>
>>>>> On Fri, Jul 20, 2018 at 11:02 AM, via cfe-dev <cfe-dev at lists.llvm.org>
>>>>> wrote:
>>>>>
>>>>>> Hello all!
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’ve been mucking around in an old codebase at work looking for easy
>>>>>> performance wins. One avenue involves replacing a switch-based variable
>>>>>> assignment with something derived from the parity of an input variable. I
>>>>>> was pretty surprised when I saw the generated assembly, and I’m wondering
>>>>>> about the reasoning behind it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In short, it boils down to the assembly __builtin_parity() produces.
>>>>>> Clang 6.0.1 (and trunk on Godbolt) produces:
>>>>>>
>>>>>>
>>>>>>
>>>>>> parity(int):                             # @parity(int)
>>>>>>
>>>>>>         mov     eax, edi
>>>>>>
>>>>>>         shr     eax
>>>>>>
>>>>>>         and     eax, 1431655765
>>>>>>
>>>>>>         sub     edi, eax
>>>>>>
>>>>>>         mov     eax, edi
>>>>>>
>>>>>>         and     eax, 858993459
>>>>>>
>>>>>>         shr     edi, 2
>>>>>>
>>>>>>         and     edi, 858993459
>>>>>>
>>>>>>         add     edi, eax
>>>>>>
>>>>>>         mov     eax, edi
>>>>>>
>>>>>>         shr     eax, 4
>>>>>>
>>>>>>         add     eax, edi
>>>>>>
>>>>>>         and     eax, 17764111
>>>>>>
>>>>>>         imul    eax, eax, 16843009
>>>>>>
>>>>>>         shr     eax, 24
>>>>>>
>>>>>>         and     eax, 1
>>>>>>
>>>>>>         ret
>>>>>>
>>>>>>
>>>>>>
>>>>>> While GCC 8.1.0 (and trunk on Godbolt) produces
>>>>>>
>>>>>>
>>>>>>
>>>>>> parity(int):
>>>>>>
>>>>>>         mov     eax, edi
>>>>>>
>>>>>>         shr     edi, 16
>>>>>>
>>>>>>         xor     eax, edi
>>>>>>
>>>>>>         xor     al, ah
>>>>>>
>>>>>>         setnp   al
>>>>>>
>>>>>>         movzx   eax, al
>>>>>>
>>>>>>         ret
>>>>>>
>>>>>>
>>>>>>
>>>>>> I know a popcnt followed by an and would be better, but unfortunately
>>>>>> some of my users don’t have computers that support the popcnt instruction,
>>>>>> so I can’t use a newer -march flag.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Could someone explain why the difference between Clang and GCC here,
>>>>>> and whether it should make a difference? The code in question is in a hot
>>>>>> loop in my code, so I’d imagine the size difference could impact unrolling
>>>>>> (and result in icache differences too), but I haven’t finished poking
>>>>>> around with benchmarks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> --
>> ~Craig
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180801/cd4299d3/attachment.html>


More information about the cfe-dev mailing list