# [LLVMdev] Predicated Vector Operations

Arnold Schwaighofer aschwaighofer at apple.com
Thu May 9 14:15:23 PDT 2013

```On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote:

> On Thu, May 9, 2013 at 8:10 AM,  <dag at cray.com> wrote:
>> Jeff Bush <jeffbush001 at gmail.com> writes:
>>
>>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...>
>>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...>
>>> %sum = fadd %tx, %ty
>>> %newvalue = select %mask, %sum, %oldvalue
>>>
>>> I believe the generated instructions depend on whether %oldvalue is
>>> still live after the last instruction. If it is, you need to generate
>>> two instructions: a copy into a new physical register then predicated
>>> write to it.  If it is not used, then it is just a predicated write to
>>> the same register.
>>>
>>>  move r1, r0
>>>
>>> (r0 is now %oldvalue and r1 is %newvalue)
>>>
>>> vs.
>>>
>>>
>>> (r0 was %oldvalue and is now %newvalue)
>>
>> I'm assuming some parts of %oldvalue are still used.  The masked fadd
>> could preserve them for false values of the mask, depending on how
>> masking was defined.  Therefore, there's no need for a register copy.
>> If the masked operation does not preserve the old values in r0, then we
>> do need a register copy.
>>
>> Preserving old values does complicate things for SSA, as you note.
>>
>>>> The bottom line is that it is probably easier to set this up before LLVM
>>>> IR goes into SSA form.
>>>
>>> That makes sense, but it's unclear to me how you would preserve that
>>> information after going into SSA form.
>>
>> I should think the semantics of select would handle that.  After a
>> select all vector elements of the result are defined.  There is no
>> preservation of old values.  There cannot be, by definition of SSA.
>>
>>> It seems to me that these are not really LLVM issues as much as the
>>> fact that SSA doesn't cleanly map to predicated instructions.
>>
>> It entirely depends on how the predication is defined to work.
>
> Good point.  I was thinking of it narrowly as preserving the old value
> in the register.  I guess I'd amend my previous statement to say that
> it actually does map just fine to SSA, but instruction selection
> becomes more complex.
>
> It sounds like the current LLVM instruction selection algorithm can't
> really handle the use case I described cleanly (generating predicated
> arithmetic instructions that preserve the old register value).  Is
> that a fair statement?

I don’t think this is a fair statement. Tied register operands should handle this use case just fine. This problem is similar to that of two-address constraints. Two address instructions work as follows. When we match an instruction we “tie” input and output registers.

x = add i32 y, z

for x86 we generate the following machine ir instruction during ISel:

vr0<def, tied1> = ADD32rr vr1<use, tied0>, vr2<use>

Once we go out of SSA during CodeGen we have to replace the two address constraint by copies:

vr0 = vr1

Coalescing and allocation will then take care of removing unnecessary copies. I think that predicate instructions would be handled similar (for the sake of making the example shorted I replaced your sequence of IR instruction by one “virtual” IR instruction):

This (actually, your sequence of selects, and add) would be matched during ISel to: