[LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions

Sat May 17 20:26:27 PDT 2014

I have a group of students working with me on some
LLVM projects related to our Parabix research.  

One interesting issue that has come up for us is
code generation support for the Haswell new instructions
pext and pdep.   These instructions shuffle bits within
a 64-bit word, either gathering all selected bits to
the beginning (pext) or scattering some initial bits
throughout (pdep).

A natural model for this is to use shufflevector
on v32i1 and v64i1 vectors.   We've got some preliminary
notes here:
http://parabix.costar.sfu.ca/wiki/BitShuffle

Since we're quite new at this, I have some questions
about strategy.

(1)  First, it seems that legalizing v32i1 and v64i1 types
for x86 would make sense.   This will allow us to
retain the shufflemasks involving these vector types
into code generation.

(2)  To legalize these types, we need to support all
the vector operations.    

  This involves implementing all of the IR operations
on these vector types.   For most of these it seems
that simple substitutions suffice.   Adding two
32vi1 vectors just involves bitwise xor, while
mul of such vectors just involves bitwise and.

My questions are these.

(1)  Is this strategy basically reasonable?

(2)  Are there alternative mechanisms for generating
pext/pdep from v32i1 and v64i1 shufflevector invocations?

(3)  The legalization method for most operations (other
than shufflevector)  is generic, i.e, not processor-specific.  
Is there a way to make types such as v64i1 legal for any processor
that supports i64?