[LLVMdev] Predicated Vector Operations
dag at cray.com
dag at cray.com
Fri May 10 09:50:57 PDT 2013
Jeff Bush <jeffbush001 at gmail.com> writes:
>>> It seems to me that these are not really LLVM issues as much as the
>>> fact that SSA doesn't cleanly map to predicated instructions.
>> It entirely depends on how the predication is defined to work.
> Good point. I was thinking of it narrowly as preserving the old value
> in the register. I guess I'd amend my previous statement to say that
> it actually does map just fine to SSA, but instruction selection
> becomes more complex.
Yes, if you want to take advantage of the hardware preserving old
values. There's nothing that requires you to do that, however. You can
always write to a new hardware register and effectively get the
"undefined for false mask" behavior.
Maybe a later pass could even clean things up and opportunistically use
the old value preserving behavior by rewriting register names and
combining instructions. Just thinking off the top of my head.
> It sounds like the current LLVM instruction selection algorithm can't
> really handle the use case I described cleanly (generating predicated
> arithmetic instructions that preserve the old register value). Is
> that a fair statement?
I actually don't know yet. I'll let you know in a few weeks once I've
tested some things. :)
>>> For example, if predicates were hypothetically added universally to
>>> the IR (which I don't think anyone wants to do), it's not clear to me
>>> how that would even work. How would you specify what value the result
>>> would be received for non-enabled lanes? Perhaps another parameter:
>>> %newvalue = fadd %x, %y, %mask, %previousvalue
>> It depends on how you define the mask operation.
>> On the Cray X1, result elements mapped to false mask values are
>> undefined after the operation.
> I assume the only point of the mask is to avoid traps/exceptions in
> this case (otherwise it doesn't really do anything, right?).
On the X1 it also potentially increases performance as masked elements
can be skipped and the operation can finish early.
On a machine like Knights Corner, the mask is the only way to have a
dynamic vector length. Traditional vector machines have a vector length
register that controls how many elements an operation should affect.
This is used to avoid the need for a remainder loop after the main
vectorized loop. A mask can be used for the same purpose. So a vector
mask is used for more than just traditional if-conversion.
More information about the llvm-dev