[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Mon Nov 28 01:43:48 PST 2016

On 28 November 2016 at 01:43, Paul Walker <Paul.Walker at arm.com> wrote:
> Reconsidering the above loops with this type system leads to IR like:
>
> (1)     <n x 4 x i32> += zext <n x 4 x i8> as <n x 4 x i32>    ; bigger_type=i32, smaller_type=i8
> (2)     <n x 16 x i8> += <n x 16 x i8>

Hi Paul,

I'm with Mehdi on this... these examples don't look problematic. You
have shown what the different constructs would be good at, but I still
can't see where they won't be.

I originally though that the extended version "<n x m x Ty>" was
required because SVE needs all vector lengths to be a multiple of
128-bits, so they'd be just "glorified" NEON vectors. Without it,
there is no way to make sure it will be a multiple.

> (1)     %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
> (2)     %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 16)
>
> The runtime part of the scalable vector lengths remains the same with the second loop processing 4x the number of elements per iteration.

Right, but this is a "constant", and LLVM would be forgiven by asking
the "size" of it. With that proposal, there's no way to know if that's
a <16 x i8> or <16 x i32>.

The vectorizer concerns itself mostly with number of elements, not raw
sizes, but these types will survive the whole process, especially if
they come from intrinsics.

> As an aside, note that I am not describing a new style of vectorisation here.  SVE is perfectly capable of non-predicated vectorisation with the loop-vectoriser ensuring no data-dependency violations using the same logic as for non-scalable vectors.  The exception is that if a strict VF is required to maintain safety we can simply fall back to non-scalable vectors that target Neon.  Obviously not ideal but it gets the ball rolling.

Right, got that. Baby steps, safety first.

cheers,
--renato