[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Wed Jul 18 12:52:00 PDT 2018
On 07/09/2018 08:01 AM, David A. Greene via llvm-dev wrote:
> Robin Kruppe <robin.kruppe at gmail.com> writes:
>
>> Everything else I know of that falls under "changing vector lengths"
>> is better served by predication or RISC-V's "active vector length"
>> (vl) register.
> Agreed. A "vl" register is slightly more efficient in some cases
> because forming predicates can be bothersome.
>
> I also want to caution about predication in LLVM IR. The way it's done
> now is, I think, not quite kosher. We use select to represent a
> predicated operation, but select says nothing about suppressing the
> evaluation of either input. Therefore, there is nothing in the IR to
> prevent code motion of Values outside the select. Indeed, I ran into
> this very problem a couple of months ago, where a legitimate (according
> to the IR) code motion resulted in wrong answers in vectorized code
> because what was supposed to be predicated was not. We had to disable
> the transformation to get things working.
>
> Another consequence of this setup is that we need special intrinsics to
> convey evaluation requirements. We have masked
> load/store/gather/scatter intrinsics and will be getting masked
> floating-point intrinsics (or something like them).
>
> Years ago we had some discussion about how to represent predication as a
> first-class IR construct but at the time it was considered too
> difficult. With more and more architectures turning to predication for
> performance, perhaps it's time to revisit that conversation.
I've also been seeing an increasing need for at (least some form of)
predication support in the IR/optimizer. At the moment, I'm mainly
concerned with what our canonical form should look like. I think that
adding first class predication is probably overkill at the moment.
In my case, I'm mostly interested in predicated scalar load and store at
the moment. We could trivially extend
@llvm.masked.load/@llvm.masked.store to handle the scalar cases, but the
more I think about it, I'm not sure this is actually a good canonical
form since it requires updating huge portions of the optimizer to handle
what is essentially a new instruction.
One idea I've been playing with is to represent predicated operations as
selects-over-inputs, and a then guaranteed non-faulting operation. For
instance, a predicated load might look like:
%pred_addr = select i1 %cnd, i32* %actual_addr, i32* %safe_addr
%pred_load = load i32 %pred_addr
where %safe_addr is something like an empty alloca or reserved global
variable. The key idea is that this can be pattern matched to an
actually predicated load if available in hardware, but can also be
optimized normally (i.e. prove the select condition).
This basic idea can be extended to any potentially faulting instruction
by selecting a "safe" input (i.e. divide by select(pred, actual, 1)
etc..). The obvious downside is the patterns can be broken up by the
optimizer in arbitrarily complex ways, but I wonder if that might be net
pro not a con.
At the moment, this is just one possible idea. I'm not yet to the point
of making any actual proposals just yet.
Philip
More information about the llvm-dev
mailing list