[LLVMdev] Predicated Vector Operations

Jeff Bush jeffbush001 at gmail.com
Thu May 9 13:05:38 PDT 2013


On Thu, May 9, 2013 at 8:10 AM,  <dag at cray.com> wrote:
> Jeff Bush <jeffbush001 at gmail.com> writes:
>
>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...>
>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...>
>> %sum = fadd %tx, %ty
>> %newvalue = select %mask, %sum, %oldvalue
>>
>> I believe the generated instructions depend on whether %oldvalue is
>> still live after the last instruction. If it is, you need to generate
>> two instructions: a copy into a new physical register then predicated
>> write to it.  If it is not used, then it is just a predicated write to
>> the same register.
>>
>>   move r1, r0
>>   fadd r1{m0}, r2, r3
>>
>> (r0 is now %oldvalue and r1 is %newvalue)
>>
>> vs.
>>
>>   fadd r0{m0}, r2, r3
>>
>> (r0 was %oldvalue and is now %newvalue)
>
> I'm assuming some parts of %oldvalue are still used.  The masked fadd
> could preserve them for false values of the mask, depending on how
> masking was defined.  Therefore, there's no need for a register copy.
> If the masked operation does not preserve the old values in r0, then we
> do need a register copy.
>
> Preserving old values does complicate things for SSA, as you note.
>
>>> The bottom line is that it is probably easier to set this up before LLVM
>>> IR goes into SSA form.
>>
>> That makes sense, but it's unclear to me how you would preserve that
>> information after going into SSA form.
>
> I should think the semantics of select would handle that.  After a
> select all vector elements of the result are defined.  There is no
> preservation of old values.  There cannot be, by definition of SSA.
>
>> It seems to me that these are not really LLVM issues as much as the
>> fact that SSA doesn't cleanly map to predicated instructions.
>
> It entirely depends on how the predication is defined to work.

Good point.  I was thinking of it narrowly as preserving the old value
in the register.  I guess I'd amend my previous statement to say that
it actually does map just fine to SSA, but instruction selection
becomes more complex.

It sounds like the current LLVM instruction selection algorithm can't
really handle the use case I described cleanly (generating predicated
arithmetic instructions that preserve the old register value).  Is
that a fair statement?

>> For example, if predicates were hypothetically added universally to
>> the IR (which I don't think anyone wants to do), it's not clear to me
>> how that would even work.  How would you specify what value the result
>> would be received for non-enabled lanes?  Perhaps another parameter:
>>
>>   %newvalue = fadd %x, %y, %mask, %previousvalue
>
> It depends on how you define the mask operation.
>
> On the Cray X1, result elements mapped to false mask values are
> undefined after the operation.

I assume the only point of the mask is to avoid traps/exceptions in
this case (otherwise it doesn't really do anything, right?).

> This is convenient for the hardware from
> a register renaming perspective.  For the same reason, tt also maps well
> to SSA because there is no reuse of old values.  It is really no
> different than a shufflevector with some undef inputs from an SSA
> viewpoint.
>
> On Intel's Knights Corner, result elements mapped to false mask elements
> are preserved.  This is tricky for SSA but I think we can model it with
> select instructions and proper selection of input values.  The input
> related to %oldvalue will be very context-sensitive and I'm not sure
> there's a way to express it generally for a single instruction outside a
> code sequence.
>
> My assumption has always been that general predication in the IR would
> have similar semantics to select except that traps would be disabled for
> false mask elements.  However I haven't fully thought through the
> implications of that.
>
> But I don't think we need general predication throughout the IR, so I
> don't think we need to worry about it.  What we need is predication at
> the expression tree leaves, which is why we're talking about loads and
> stores.  For loads, there is no %oldvalue so we don't have to worry
> about it.
>
> %ra = ...
> if (%mask) {
>   %rb = [mem1]
> }
> else {
>   %rb = [mem2]
> }
> %rc = %ra + %rb
>
> Would be translated to:
>
> %ra = ...
>
> %rt = load [mem1], %mask
> %rf = load [mem2], ~%mask
>
> %rb = select %mask, %rt, %rf
> %rc = fadd %ra, %rb
>
> Since the input select defined all vector elements we don't need selects
> on the fadd.  It's an unmasked operation.
>
> Slightly more complicated:
>
> %ra = ...
> if (%mask) {
>   %rb = [mem1]
>   %rc = %ra + %rb
> }
> else {
>   %rb = [mem2]
>   %rc = %ra - %rb
> }
>
> Would be translated to:
>
> %ra = ...
>
> %rt = load [mem1], %mask
> %ru = select %mask, %rt, 0.0 ; Fill with safe values
> %rv = fadd %ra, %ru
>
> %rf = load [mem2], ~%mask
> %rg = select ~%mask, %rf, 0.0 ; Fill with safe values
> %rh = fadd %ra, %rg
>
> %rc = select %mask, %rv, %rh
>
> The reason we need predication on loads and stores is so they don't trap
> for false values of the mask.  Currently there is no way to express this
> in the IR.
>
>                                    -David



More information about the llvm-dev mailing list