[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sun Nov 27 08:51:37 PST 2016

Bringing the discussion back onto the IR proposals:

> One way to make your "seriesvector" concept show up *before* any spec
> is out is to apply it to non-scalable vectors.
>
> Today, we have the "zeroinitializer", which is very similar to what
> you want. You can even completely omit the "vscale" if we get the
> semantics right.

There is nothing to stop other targets from using
stepvector/seriesvector. In fact for wide vector targets, often the IR
constant for representing a step vector is explicitly expressed as
<i32 0, i32 1, i32 2..> and so on (this gets really cumbersome when
your vector length is 512bits for example). That could be replaced by
a single "stepvector" constant, and it works the same for both
fixed-length and scalable vectors.

Amara

On 27 November 2016 at 15:42, Renato Golin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> On 27 November 2016 at 13:59, Paul Walker <Paul.Walker at arm.com> wrote:
>> Thanks Renato, my takeaway is that I am presenting the design out of order.  So let's focus purely on the vector length (VL) and ignore everything else.  For SVE the vector length is unknown and can vary across an as yet undetermined boundary (process, library....).  Within a boundary we propose making VL a constant with all instructions that operate on this constant locked within its boundary.
>
> This is in line with my current understanding of SVE. Check.
>
>
>> I know this stretches the meaning of constant and my reasoning (however unsound) is below.  We expect changes to VL to be infrequent and not located where it would present an unnecessary barrier to optimisation.  With this in mind the initial implementation of VL barriers would be an intrinsic that prevents any instruction movement across it.
>>
>> Question: Is this type of intrinsic something LLVM supports today?
>
> Function calls are natural barriers, but they should outline the
> parameters that cannot cross, especially if they're local, to make
> sure those don't cross it. In that sense, specially crafted intrinsics
> can get you the same behaviour, but it will be ugly.
>
> Also, we have special purpose barriers, ex. @llvm.arm|aarch64.dmb,
> which could serve as template for scalable-specific barriers.
>
>
>> Why a constant? Well it doesn't change within the context it is being used. More crucially the LLVM implementation of constants gives us a property that's very important to SVE (perhaps this is where prototyping laziness has kicked in).  Constants remain attached to the instructions that operate on them through until code generation.  This allows the semantic meaning of these instruction to be maintained, something non-scalable vectors get for free with their "real" constants.
>
> This makes sense. Not just because it behaves similarly, but because
> the back-end *must* guarantee it will be a constant within its
> boundaries and fail otherwise. That's up to the SVE code-generator to
> add enough SVE-specific instructions to get that right.
>
>
>>         shufflevector <n x 4 x i32> %a, <n x 4 x i32> undef, <n x 4 x i32> seriesvector ( sub (i32 VL, 1), i32 -1)
>>
>> Firstly I'll highlight the use of seriesvector is purely for brevity, let's ignore that debate for now.  Our concern is that not treating VL as a Constant means sub and seriesvector are no longer constant and are likely to be hoisted away from the shufflevector.  The knock on effect being to force the code generator into generating generic vector permutes rather than utilise any specialised permute instructions the target provides.
>
> The concept looks ok.
>
> IIGIR, your argument is that an intrinsic will not look "constant
> enough" to the other IR passes, which can break the contantness
> required to generate the correct "constant" vector.
>
> I'm also assuming SVE has an instruction that relates to the syntax
> above, which will reduce the setup process from N instructions to one
> and will be scale-independent. Otherwise, that whole exercise is
> meaningless.
>
> Something like:
>   mov  x2, #i
>   const       z0.b, p0/z, x2, 2     # From (i) to (2*VF)
>   const       z1.b, p0/z, x2, -1    # From (i) to (i - VF) in reverse
>
> The undefined behaviour that will come of such instructions need to be
> understood in order to not break the IR.
>
> For example, if x2 is an unsigned variable and you iterate through the
> array but the array length is not a multiple of VF, the last range
> will pass through zero and become negative at the end. Or, if x2 is a
> 16-bit variable that must wrap (or saturate) and the same tail issue
> happens above.
>
>
>> Does this make sense? I am not after agreement just want to make sure we are on the same page regarding our aims before digging down into how VL actually looks and its interaction with the loop vectoriser’s chosen VF.
>
> As much sense as is possible, I guess.
>
> But without knowing the guarantees we're aiming for, it'll be hard to
> know if any of those proposals will make proper sense.
>
> One way to make your "seriesvector" concept show up *before* any spec
> is out is to apply it to non-scalable vectors.
>
> Today, we have the "zeroinitializer", which is very similar to what
> you want. You can even completely omit the "vscale" if we get the
> semantics right.
>
> Hope that helps.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev