[llvm-dev] Question about VectorLegalizer::ExpandStore() with v4i1

Tue Jun 28 10:57:09 PDT 2016

Hi, Ahmed.

A packed representation, one bit per i1, is natural and best for our
work, for sure.   In the Parabix project, we produced very fast text
and byte stream processing applications using packed bit streams,
stored 128 bits at a time for SSE/Neon/Altivec registers, 256 bits at
a time for AVX, 512 bits at a time for AVX 512.   

I also think that the one bit per i1 approach is best and most consistent
overall.   Vectors are not arrays.   Vectors are intended to be treated
as single values.  Whereas an array of i1 could reasonably be viewed as
an array of bytes, a vector of i1 should be packed. 

The use of vector types in general should signify that efficient loading,
storing and manipulating of vectors is more important than manipulation of
individual elements.   The entire point is to provide a natural model for
SIMD instruction sets, it seems to me.

As you say, the packed representation makes a lot of sense for AVX512.
But even the existing SSE and AVX instruction sets use a packed representation
in many cases.   For example, the SSE operation movmskps produces a 4xi1
and pmovmskb produces 16xi1, both in packed form.   In addition, any
icmp or fcmp operation can be easily implemented using two instructions
to produce packed i1 values.   Our software relies on this packed
representation extensively.

> 
> JinGu,
> 
> Your analysis is correct, vectors of i1 are incorrectly legalized.
> This is a known issue (http://llvm.org/PR22603); the tricky part about
> fixing it is the need to settle on a memory layout for these vectors
> (packed vs byte per i1;  packed would be compatible with AVX512, I
> think).
> 
> -Ahmed
>