[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Wed Jul 18 12:52:00 PDT 2018

On 07/09/2018 08:01 AM, David A. Greene via llvm-dev wrote:
> Robin Kruppe <robin.kruppe at gmail.com> writes:
>
>> Everything else I know of that falls under "changing vector lengths"
>> is better served by predication or RISC-V's "active vector length"
>> (vl) register.
> Agreed.  A "vl" register is slightly more efficient in some cases
> because forming predicates can be bothersome.
>
> I also want to caution about predication in LLVM IR.  The way it's done
> now is, I think, not quite kosher.  We use select to represent a
> predicated operation, but select says nothing about suppressing the
> evaluation of either input.  Therefore, there is nothing in the IR to
> prevent code motion of Values outside the select.  Indeed, I ran into
> this very problem a couple of months ago, where a legitimate (according
> to the IR) code motion resulted in wrong answers in vectorized code
> because what was supposed to be predicated was not.  We had to disable
> the transformation to get things working.
>
> Another consequence of this setup is that we need special intrinsics to
> convey evaluation requirements.  We have masked
> load/store/gather/scatter intrinsics and will be getting masked
> floating-point intrinsics (or something like them).
>
> Years ago we had some discussion about how to represent predication as a
> first-class IR construct but at the time it was considered too
> difficult.  With more and more architectures turning to predication for
> performance, perhaps it's time to revisit that conversation.
I've also been seeing an increasing need for at (least some form of) 
predication support in the IR/optimizer.  At the moment, I'm mainly 
concerned with what our canonical form should look like.  I think that 
adding first class predication is probably overkill at the moment.

In my case, I'm mostly interested in predicated scalar load and store at 
the moment.  We could trivially extend 
@llvm.masked.load/@llvm.masked.store to handle the scalar cases, but the 
more I think about it, I'm not sure this is actually a good canonical 
form since it requires updating huge portions of the optimizer to handle 
what is essentially a new instruction.

One idea I've been playing with is to represent predicated operations as 
selects-over-inputs, and a then guaranteed non-faulting operation.  For 
instance, a predicated load might look like:
%pred_addr = select i1 %cnd, i32* %actual_addr, i32* %safe_addr
%pred_load = load i32 %pred_addr
where %safe_addr is something like an empty alloca or reserved global 
variable.  The key idea is that this can be pattern matched to an 
actually predicated load if available in hardware, but can also be 
optimized normally (i.e. prove the select condition).

This basic idea can be extended to any potentially faulting instruction 
by selecting a "safe" input (i.e. divide by select(pred, actual, 1) 
etc..).  The obvious downside is the patterns can be broken up by the 
optimizer in arbitrarily complex ways, but I wonder if that might be net 
pro not a con.

At the moment, this is just one possible idea.  I'm not yet to the point 
of making any actual proposals just yet.

Philip