[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sun Nov 27 07:42:28 PST 2016

On 27 November 2016 at 13:59, Paul Walker <Paul.Walker at arm.com> wrote:
> Thanks Renato, my takeaway is that I am presenting the design out of order.  So let's focus purely on the vector length (VL) and ignore everything else.  For SVE the vector length is unknown and can vary across an as yet undetermined boundary (process, library....).  Within a boundary we propose making VL a constant with all instructions that operate on this constant locked within its boundary.

This is in line with my current understanding of SVE. Check.

> I know this stretches the meaning of constant and my reasoning (however unsound) is below.  We expect changes to VL to be infrequent and not located where it would present an unnecessary barrier to optimisation.  With this in mind the initial implementation of VL barriers would be an intrinsic that prevents any instruction movement across it.
>
> Question: Is this type of intrinsic something LLVM supports today?

Function calls are natural barriers, but they should outline the
parameters that cannot cross, especially if they're local, to make
sure those don't cross it. In that sense, specially crafted intrinsics
can get you the same behaviour, but it will be ugly.

Also, we have special purpose barriers, ex. @llvm.arm|aarch64.dmb,
which could serve as template for scalable-specific barriers.

> Why a constant? Well it doesn't change within the context it is being used. More crucially the LLVM implementation of constants gives us a property that's very important to SVE (perhaps this is where prototyping laziness has kicked in).  Constants remain attached to the instructions that operate on them through until code generation.  This allows the semantic meaning of these instruction to be maintained, something non-scalable vectors get for free with their "real" constants.

This makes sense. Not just because it behaves similarly, but because
the back-end *must* guarantee it will be a constant within its
boundaries and fail otherwise. That's up to the SVE code-generator to
add enough SVE-specific instructions to get that right.

>         shufflevector <n x 4 x i32> %a, <n x 4 x i32> undef, <n x 4 x i32> seriesvector ( sub (i32 VL, 1), i32 -1)
>
> Firstly I'll highlight the use of seriesvector is purely for brevity, let's ignore that debate for now.  Our concern is that not treating VL as a Constant means sub and seriesvector are no longer constant and are likely to be hoisted away from the shufflevector.  The knock on effect being to force the code generator into generating generic vector permutes rather than utilise any specialised permute instructions the target provides.

The concept looks ok.

IIGIR, your argument is that an intrinsic will not look "constant
enough" to the other IR passes, which can break the contantness
required to generate the correct "constant" vector.

I'm also assuming SVE has an instruction that relates to the syntax
above, which will reduce the setup process from N instructions to one
and will be scale-independent. Otherwise, that whole exercise is
meaningless.

Something like:
  mov  x2, #i
  const       z0.b, p0/z, x2, 2     # From (i) to (2*VF)
  const       z1.b, p0/z, x2, -1    # From (i) to (i - VF) in reverse

The undefined behaviour that will come of such instructions need to be
understood in order to not break the IR.

For example, if x2 is an unsigned variable and you iterate through the
array but the array length is not a multiple of VF, the last range
will pass through zero and become negative at the end. Or, if x2 is a
16-bit variable that must wrap (or saturate) and the same tail issue
happens above.

> Does this make sense? I am not after agreement just want to make sure we are on the same page regarding our aims before digging down into how VL actually looks and its interaction with the loop vectoriser’s chosen VF.

As much sense as is possible, I guess.

But without knowing the guarantees we're aiming for, it'll be hard to
know if any of those proposals will make proper sense.

One way to make your "seriesvector" concept show up *before* any spec
is out is to apply it to non-scalable vectors.

Today, we have the "zeroinitializer", which is very similar to what
you want. You can even completely omit the "vscale" if we get the
semantics right.

Hope that helps.

cheers,
--renato