[LLVMdev] Vector instructions

Fri Jun 27 12:13:28 PDT 2008

On Jun 27, 2008, at 8:02 AM, Stefanus Du Toit wrote:
>>>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x  
>>>> i32>
>>>> <mask> ; yields <d x <ty>>
>>>
>>> With the requirement that the entries in the (still constant) mask
>>> are
>>> within the range of [0, a + b - 1].
>
>> The alternative is to have the frontend synthesize the needed
>> operations with extracts, inserts, and possibly shuffles if needed.
>> LLVM is actually fairly well prepared to optimize code like this.
>> I recommend giving this a try, and reporting any problems you
>> encounter.
>
> That certainly appears to be the only option at the moment, and we'll
> have a look to see how that works out. However, note that a
> sufficiently generalized shufflevector would remove the need for
> insertelement and extractelement to exist completely.

You should look into how this works with clang.  Clang allows you to  
do things like this, for example:

typedef __attribute__(( ext_vector_type(4) )) float float4;

float2 vec2, vec2_2;
float4 vec4, vec4_2;
float f;

void test2() {
     vec2 = vec4.xy;  // shorten
     f = vec2.x;      // extract elt
     vec4 = vec4.yyyy;  // splat
     vec4.zw = vec2;    // insert
}

etc.  It also offers operators to extract all the even or odd elements  
of a vector, do arbitrary two-input-vector shuffles with  
__builtin_shuffle etc.

>>> 2. vector select
>>> 3. vector trunc, sext, zext, fptrunc, fpext
>>> 4. vector shl, lshr, ashr
>> [...]
>>
>> We agree that these would be useful. There are intentions to add them
>> to LLVM; others can say more.
>
> OK. I'd love to hear more, especially if someone is planning to do
> this in the short term.

Most of the extensions you suggest are great ideas, but we need more  
than ideas: we need someone to help implement the ideas ;-).

>> It turns out that having them return vectors of i1 would be somewhat
>> complicated. For example, a <4 x i1> on an SSE2 target could expand
>> either to 1 <4 x i32> or 2 <2 x i64>s, and the optimal thing would
>> be to make the decision based on the comparison that produced them,
>> but LLVM isn't yet equipped for that.
>
> Can you expand on this a bit? I'm guessing you're referring to
> specific mechanics in LLVM's code generation framework making this
> difficult to do?

I'm not sure that it really is a matter of what is easy or hard to do  
in LLVM.  This model more closely matches the model implemented by  
common SIMD systems like Altivec, SSE, CellSPU, Alpha, etc.  The  
design of the system was picked to model the systems that we know of  
well, we obviously can't plan to handle systems that we don't know  
about.

>> vicmp and vfcmp are very much aimed at solving practical problems on
>> popular architectures without needing significant new infrastructure
>> They're relatively new, and as you say, they'll be more useful when
>> combined with vector shifts and friends and we start teaching LLVM
>> to recognize popular idioms with them.
>
> Can you give me examples of how they are used today at all? I'm having
> a really hard time figuring out a good use for them (that doesn't
> involve effectively scalarizing them immediately) without these other
> instructions.

They can be used with target-specific intrinsics.  For example, SSE  
provides a broad range of intrinsics to support instructions that LLVM  
IR can't express well.  See llvm/include/llvm/IntrinsicsX86.td for  
more details.

If you're interested in helping shape the direction of LLVM vector  
support, my advice is that "patches speak louder than words" :).  I'd  
love to see improved vector support in LLVM, but unless someone is  
willing to step forward and implement it, it is all just idle talk.

-Chris