[LLVMdev] Predicated Vector Operations

Fri May 10 09:50:57 PDT 2013

Jeff Bush <jeffbush001 at gmail.com> writes:

>>> It seems to me that these are not really LLVM issues as much as the
>>> fact that SSA doesn't cleanly map to predicated instructions.
>>
>> It entirely depends on how the predication is defined to work.
>
> Good point.  I was thinking of it narrowly as preserving the old value
> in the register.  I guess I'd amend my previous statement to say that
> it actually does map just fine to SSA, but instruction selection
> becomes more complex.

Yes, if you want to take advantage of the hardware preserving old
values.  There's nothing that requires you to do that, however.  You can
always write to a new hardware register and effectively get the
"undefined for false mask" behavior.

Maybe a later pass could even clean things up and opportunistically use
the old value preserving behavior by rewriting register names and
combining instructions.  Just thinking off the top of my head.

> It sounds like the current LLVM instruction selection algorithm can't
> really handle the use case I described cleanly (generating predicated
> arithmetic instructions that preserve the old register value).  Is
> that a fair statement?

I actually don't know yet.  I'll let you know in a few weeks once I've
tested some things.  :)

>>> For example, if predicates were hypothetically added universally to
>>> the IR (which I don't think anyone wants to do), it's not clear to me
>>> how that would even work.  How would you specify what value the result
>>> would be received for non-enabled lanes?  Perhaps another parameter:
>>>
>>>   %newvalue = fadd %x, %y, %mask, %previousvalue
>>
>> It depends on how you define the mask operation.
>>
>> On the Cray X1, result elements mapped to false mask values are
>> undefined after the operation.
>
> I assume the only point of the mask is to avoid traps/exceptions in
> this case (otherwise it doesn't really do anything, right?).

On the X1 it also potentially increases performance as masked elements
can be skipped and the operation can finish early.

On a machine like Knights Corner, the mask is the only way to have a
dynamic vector length.  Traditional vector machines have a vector length
register that controls how many elements an operation should affect.
This is used to avoid the need for a remainder loop after the main
vectorized loop.  A mask can be used for the same purpose.  So a vector
mask is used for more than just traditional if-conversion.

                           -David