[LLVMdev] Extending vector operations

Stefanus Du Toit stefanus.dutoit at rapidmind.com
Wed Jul 23 09:48:08 PDT 2008


On 23-Jul-08, at 12:15 PM, David Greene wrote:
> On Monday 21 July 2008 15:21, Stefanus Du Toit wrote:
>
>> We would like to extend the vector operations in llvm a bit. We're
>> hoping to get some feedback on the right way to go, or some starting
>> points. I had previously had some discussion on this list about a
>> subset of the changes we have in mind.
>
> Woohoo!  We've been interested in talking about this for some time.

Glad to hear it!

>> ===
>> 1) Vector shl, lshr, ashr
>>
>> I think these are no-brainers. We would like to extend the semantics
>> of the shifting instructions to naturally apply to vectors as well.
>> One issue is that these operations often only support a single shift
>> amount for an entire vector. I assume it should be fairly
>
> So you're assuming a shift of a vector by a scalar?  What about the
> general vector-by-vector version?

No, I am proposing a general vector-by-vector version, just pointing  
out that many architectures do not support this natively, but that  
this should be a simple matter of legalization.

>> 2) Vector strunc, sext, zext, fptrunc and fpext
>>
>> Again, I think these are hopefully straightforward. Please let me  
>> know
>> if you expect any issues with vector operations that change element
>> sizes from the RHS to the LHS, e.g. around legalization.
>
> Is the assumption that all elements are changed in the same way?

Since these only depend on the types of the source and destination  
operand, and vectors are homogeneous, yes.

The intention is that for any of these conversion ops,

   %A = op <N x S> %B to <N x T>

is semantically equivalent to this pseudo-IR:

   %A = undef <N x T>
   for i = 0 .. N - 1:
     %t1 = extractelement <N x S> %B, i
     %t2 = op S %t1 to T
     %A = insertelement <N x T> %A, %t2, i

>> 4) Vector select
>>
>> We consider a vector select extremely important for a number of
>> operations. This would be an extension of select to support an <N x
>> i1> vector mask to select between elements of <N x T> vectors for  
>> some
>> basic type T. Vector min, max, sign, etc. can be built on top of this
>> operation.
>
> Yes, merge/blend is a very important operation.  Also, it would be  
> nice to
> think about generalizing this to apply masks to all vector operations,
> particularly loads and stores.

Can you elaborate about what you mean by this? I've built vector IRs  
in the past that have writemasking effectively built into all IR ops,  
and I don't know if I would recommend this approach generally. But I'm  
not sure that's what you mean.

>> I realize this is probably the most controversial change amongst
>> these. I gather there is some concern about representing "variable
>> width" i1s, but I would contend that that's the case even for i1s
>> which are not vectors.
>
> What do you mean by "variable width?"

At some point the i1 vectors may need to be converted to masks. The  
sizes for these masks will often depend on how they're used. On ISAs  
like SSE and CellSPU, a comparison generates a mask corresponding to  
the size of the operands being compared, and a selection requires a  
mask corresponding to the size of the operands being selected.

The "simple" way to deal with this is to insert appropriate conversion  
code at uses of i1s, and pick a representation for a given i1 based on  
its SSA birthpoint. It's a little more ambiguous when you start adding  
phis into the mix, e.g.:

a:
	%a1 = fcmp olt <2 x float> %f1, %f2 ; yields <2 x i1>
	br label %c

b:
	%a2 = fcmp olt <2 x double> %d1, %d2 ; yields <2 x i1>
	br label %c

c:
	%a3 = phi <2 x i1> [%a1, %a], [%a2, %b]
	select <2 x i1> %a3, %s1, %s2 ; where s1, s2 are <2 x i16>
	select <2 x i1> %a3, %c1, %c2 ; where s1, s2 are <2 x i8>

The representation for %a1 may be <2 x i32>, for %a2 <2 x i64>, but  
for %a3 it's less lear.

I don't think this is a huge problem, but it's something to be aware  
of. Note the same issue can affect even scalar i1 values. In fact,  
Scott Michel posted to llvm-dev recently about dealing with this issue  
for setcc in the CellSPU backend.

>> that only POT sizes are expected to work (although we've found some
>> bugs for things like <2 x float> that we're submitting). Ideally
>> things like <3 x float> operands would usually be rounded up to the
>> size supported by the machine directly. We can try to do this in the
>
> You might need mask support as well, especially if the operation can  
> trap.

Hmm. Yes, divisions by zero etc. are something we should think about.

>> Please let me know what you think,
>
> Let's connect at the dev meeting along with others interested in  
> this stuff
> and start thinking about how to proceed.

Sounds good.

--
Stefanus Du Toit <stefanus.dutoit at rapidmind.com>
   RapidMind Inc.
   phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463






More information about the llvm-dev mailing list