[LLVMdev] Predicated Vector Operations

Wed May 8 11:07:35 PDT 2013

Jeff Bush <jeffbush001 at gmail.com> writes:

> I'm trying to understand how predicated/masked instructions can be
> generated in llvm, specifically an instruction where a set bit in the
> mask will write the new result into the corresponding vector lane in
> the destination and a clear bit will cause the lane in the destination
> to remain what it was before the instruction executed.
>
> I've seen a few places that suggest 'select' is the proper way to
> implement predication. I believe the predicated form cannot be
> explicitly expressed in LLVM asm, because it is SSA. It can be done
> implicitly:
>
> %sum = add <16 x i32> %x, %y
> %newvalue = select <16 x i1> %mask, <16 x i32> %sum, <16 x i32>
> %oldvalue

This is not necessarily sufficient in general.  For any operation that
can trap (pretty much any fp operation) you also need to protect the
inputs with safe values to adhere to the semantics of the LLVM IR.  For
an fadd, for example:

%tx = select %mask, %x, <0.0, 0.0, 0.0 ...>
%ty = select %mask, %y, <0.0, 0.0, 0.0 ...>
%sum = fadd %tx, %ty
%newvalue = select %mask, %sum, %oldvalue

Then this entire pattern of selects and fadd will have to be matched in
isel, which would throw away the safe value protection overhead (because
the hardware masking takes care of the safety).

In fact the problem is more general.  Even in the integer add case, you
have to ensure that llvm won't move the definition of %sum before the
definition of %mask.  Otherwise you're going to end up generating a
predicated add with a bad mask value.  Selects on the inputs to the
integer add also solve this problem.

> The issue becomes how to match the instruction form above in a
> TableGen pattern. In order for this to emit a masked instruction,
> %newvalue and %oldvalue must be assigned to same physical register (I'm
> assuming an instruction like 'add %r0{%m0} %r1 %r2') However, I don't
> think there is even a notion of physical registers at the point that
> instruction selection is performed and the virtual registers will be
> different because everything is still in SSA form.

Potentially you could use the "$src = $dst" constraint as in the
two-address x86 forms.  I don't know that TableGen has been generalized
enough to do this, though.  I think it's pretty highly specialized to
specific x86 two-address forms at the moment.

Another problem is how to express %oldvalue in LLVM IR.  Presumably
%newvalue will be consumed, possibly by another arithmetic operation.
Presumably %oldvalue can similarly come from a previous arithmetic
operation feeding into the add.  If that's true, then %oldvalue is
either %x or %y.  Otherwise it is some other thing highly
context-dependent.

The gpuocelot project ran into the problem and they talk about it here:

http://code.google.com/p/gpuocelot/source/browse/wiki/LLVM.wiki?r=272

The bottom line is that it is probably easier to set this up before LLVM
IR goes into SSA form.

There is a lot of interest in predication and a lot of recent
discussions about handling it in LLVM.  Personally I think that
long-term we will need some IR changes.  It might be as simple as adding
an IR-level predicated load and predicated store, I'm not sure.

We're pretty heavily into trying to get predicated vector code working
well in LLVM here.  Expect to see many questions, code examples and
hopefully upstream code submissions over the next couple of months.

                          -David