[LLVMbugs] [Bug 1983] New: Teach LLVM about parity

Mon Feb 4 20:31:00 PST 2008

http://llvm.org/bugs/show_bug.cgi?id=1983

           Summary: Teach LLVM about parity
           Product: new-bugs
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sharparrow1 at yahoo.com
                CC: llvmbugs at cs.uiuc.edu

Spun off of Bug 1979, because I don't want to pollute that bug with a lot of
irrelevant discussion.

(In reply to comment #2)
> Eli: some random unrelated thoughts :).  If you're interested in parity, I
> wonder if llvm-gcc generates efficient code for __builtin_parity.  Could you
> check?

It appears llvm-gcc doesn't support __builtin_parity... at least the version at
http://llvm.org/demo/index.cgi (I don't have a trunk build).

LLVM currently generates bad code for llvm.ctpop.i32(x) & 1; it just generates
the ctpop and ands it with 1, which is really slow for cpu's without a popcount
instruction.

(In reply to comment #3)
> It would also be useful to teach instcombine about various idioms for parity,
> turning it into llvm.popct(x)&1

What exactly are common idioms for parity? The only idioms that would be easy
to detect are popcount & 1 and maybe the pure shift+xor method.

I've dome some rough testing of various ways of computing parity using a
benchmark that looks like the testcase for this bug.

For x86, something like the following is probably best; it appears to be as
fast as a lookup table.

static unsigned parity2(unsigned x) {
  int result;
  asm ("movl    %1, %%ecx\n"
       "shrl    $16, %%ecx\n"
       "xorl    %1, %%ecx\n"
       "xorb    %%ch, %%cl\n"
       "setnp   %%cl\n"
       "movzbl  %%cl, %0\n"
        :"=r"(result)    /* output */
        :"g" (x)         /* input */
        :"%ecx"         /* clobbered register */
       );
  return result;
}

I don't know how difficult it would be to make LLVM generate something like
that, though.

Besides that, the fastest method is a lookup table; however, a full lookup
table is at least 256 bytes.  Without a lookup table, the method from my
testcase is the fastest. The method from
http://graphics.stanford.edu/~seander/bithacks.html#ParityParallel is a little
slower. Purely using xor+shift is a little slower than that (it adds up to
being about twice as slow as the asm or table lookup).

That said, the way I'm benchmarking is not at all representative of real-world
code, so my results might be a bit off.

The version of gcc I have converts __builtin_parity into a call to __paritysi2,
a method in libgcc which uses a lookup table; however, I think that's changed 
(http://www.archivum.info/gcc-patches@gcc.gnu.org/2007-02/msg00999.html).

Is there actually any real-world use for 32-bit parity, though?  I just looked
at it because I was curious...

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.