[LLVMdev] Vector select/compare support in LLVM

Thu Mar 10 06:29:09 PST 2011

Hey,

I am currently forced to create the BLENDVPS intrinsic as an external 
call (via Intrinsic::x86_sse41_blendvps) which has the following 
signature (from IntrinsicsX86.td):

def int_x86_sse41_blendvps :
GCCBuiltin<"__builtin_ia32_blendvps">,
Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty, 
llvm_v4f32_ty],[IntrNoMem]>

Thus, it expects the mask (first operand if i recall correctly) to be a 
<4 x float>.
It would be great to have this mirrored in the IR, meaning one should be 
able to create a SelectInst with 3 <4 x float> operands which would 
generate this intrinsic.
Is there anything that speaks against this?

I think I also recall something similar for ICmp/FCmp instructions...

Best,
Ralf

P.S. I am not up-to-date on the latest status of "direct" support of 
vector instructions, the corresponding part of my system has been 
written over a year ago.

On 3/10/11 1:44 PM, Rotem, Nadav wrote:
> After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers.  So, I guess that legalizing of<4 x i1>  to<4 x i32>  is the way to go.
>
> Cheers,
> Nadav
>
> -----Original Message-----
> From: Rotem, Nadav
> Sent: Thursday, March 10, 2011 11:04
> To: 'David A. Greene'
> Cc: llvmdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] Vector select/compare support in LLVM
>
> Hi David,
>
> The MOVMSKPS instruction is cheap (2 cycles).  Not to be confused with VMASKMOV, the AVX masked move, which is expensive.
>
> One of the arguments for packing masks is that it reduces vector-registers pressure.  Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nesting, etc).  Saving masks in xmm registers may result in vector-register pressure which will cause spilling of these registers.  I agree with you that GP registers are also a precious resource.
> I am not sure what is the best way to store masks.
>
> In my private branch, I added the [v4i1 .. v64i1] types. I also implemented a new type of target lowering: "PACK". This lowering packs vectors of i1s into integer registers. For example, the<4 x i1>  type would get packed into the i8 type. I modified LegalizeTypes and LegalizeVectorTypes and added legalization for SETCC, XOR, OR, AND, and BUILD_VECTOR.  I also changed the x86 lowering of SELECT to prevent lowering of selects with vector condition operand. Next, I am going to add new patterns for SETCC and SELECT which use i8/i16/i32/i64 as a condition value.
>
> I also plan to experiment with promoting<4 x i1>  to<4 x i32>.  At this point I can't really say what needs to be done.  Implementing this kind of promotion also requires adding legalization support for strange vector types such as<4 x i65>.
>
> -Nadav
>
>
>
> -----Original Message-----
> From: David A. Greene [mailto:greened at obbligato.org]
> Sent: Wednesday, March 09, 2011 21:59
> To: Rotem, Nadav
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Vector select/compare support in LLVM
>
> "Rotem, Nadav"<nadav.rotem at intel.com>  writes:
>
>> I can think of two ways to represent masks in x86: sparse and
>> packed. In the sparse method, the masks are kept in<4 x 32bit>
>> registers, which are mapped to xmm registers. This is the ‘native’ way
>> of using masks.
>
> This argues for the sparse representation, I think.
>
>> _Sparse_ After my discussion with Duncan, last week, I started working
>> on the promotion of type<4 x i1>  to<4 x i32>, and I ran into a
>> problem.  It looks like the codegen term ‘promote’ is overloaded.
>
> Heavily.  :-/
>
>>   For scalars, the ‘promote’ operation converts scalars to larger
>> bit-width scalars.  For vectors, the ‘promote’ operation widens the
>> vector to the next power of two.  This is reasonable for types such as
>> ‘<3 x float>’.  Maybe we need to add another legalization operation which
>> will mean widening the vectors?
>
> You mean widening the element type, correct?  Yes, that's definitely a
> useful concept.
>
>>   In any case, I estimated that implementing this per-element promotion
>> would require major changes and decided that this is not the way to
>> go.
>
> What major changes?  I think this will end up giving much better code in
> the end.  The pack/unpack operations could be very expensive.
>
> There is another huge cost in using GPRs to hold masks.  There will be
> fewer GPRs to hold addresses, which is a precious resource.  We should
> avoid doing anything that uses more of that resource unnecessarily.
>
>                               -Dave
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev