[LLVMdev] Predication on SIMD architectures and LLVM

Tue Oct 23 09:43:34 PDT 2012

I am talking about the LLVM select instruction, not a vector select:

http://llvm.org/docs/LangRef.html#i_select

In any non-trapping case, an arithmetic operation (or sequence of operations) followed by a select is semantically equivalent to the predicated version.  This is exactly how predicated instructions on ARM are handled.  For example, the following IR:

  %cmp = icmp sgt i32 %c, %b
  %add = add nsw i32 %b, 1
  %add1 = add nsw i32 %c, 2
  %retval.0 = select i1 %cmp, i32 %add, i32 %add1

Becomes this ARM assembly:

	add	r2, r1, #2
	cmp	r1, r0
	addgt	r2, r0, #1
	mov	r0, r2

An equally valid form would be:

	cmp	r1, r0
	addle	r2, r1, #2
	addgt	r2, r0, #1
	mov	r0, r2

Separating the select, which embodies the predication, from the operations allows more choice in terms of the final representation.  Unless the load or store is volatile, the compiler is free to elide it if its result is not used, and is most definitely free to fold it into a predicated load.  The same is obviously true of any side-effect-free operations, such as divides and square roots: folding them into predicated instructions is no less invalid than conditionally executing them in branches or removing them entirely via dead code elimination.  

Just because the generated machine code must contain predicated instructions most definitely does mean that the LLVM IR must contain it, or even that we would gain anything in terms of expressive power by permitting it.  

David

On 23 Oct 2012, at 17:25, <dag at cray.com> wrote:

> David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:
> 
>> Perhaps I am missing something, but isn't a predicated instruction
>> effectively an single-instruction version of an arithmetic operation
>> followed by a select?  
> 
> No, it is not.  Among other things, predication is used to avoid traps.
> A vector select is an entirely different operation.
> 
>> As we can already represent this in the IR, and already match other
>> predicated instructions (e.g. on ARM) to this pattern, what is gained
>> by adding predication directly to the IR?
> 
> Predicated loads, stores, divides, sqrts, etc. are essential for
> correctly vectorizing loops with conditionals due to safety concerns.
> If the loop body has no dangerous operations, then yes, a vector select
> can be used without problems but it is often slower than predication.
> Usually the hardware can optimize instructions with certain values of
> predicates.
> 
>                              -David