[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sat Nov 26 12:40:25 PST 2016

On Sat, Nov 26, 2016 at 9:07 AM Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 26 November 2016 at 11:49, Paul Walker <Paul.Walker at arm.com> wrote:
> > Related to this I want to push this and related conversations in a
> different direction.  From the outset our approach to add SVE support to
> LLVM IR has been about solving the generic problem of vectorising for an
> unknown vector length and then extending this to support predication.  With
> this in mind I would rather the problem and its solution be discussed at
> the IR's level of abstraction rather than getting into the guts of SVE.
>
> Hi Paul,
>
> How scalable vectors operate is intimately related to how you
> represent them in IR. It took a long time for the vector types to be
> mapped to all available semantics. We still had to use a bunch of
> intrinsics for scatter / gather, it took years to get the strided
> access settled.
>
> I understand that scalable vectors are orthogonal to all this, but as
> a new concept, one that isn't available in any open source compiler I
> know of, is one that will likely be very vague. Not publishing the
> specs only make it worse.
>
> I take the example of the ACLE and ARMv8.2 patches that ARM has been
> pushing upstream. I have no idea what the new additions are, so I have
> to take your word that they're correct. But later on, different
> behaviour comes along for the same features with a comment "it didn't
> work that way, let's try this". Sometimes, I don't even know what
> failed, or why this new thing is better.
>
> When that behaviour is constricted to the ARM back-end, it's ok. It's
> a burden that me and Tim will have to carry, and so far, it has been a
> small burden. But exposing the guts of the vectorizers (which are
> already getting to a point where the need large refactorings), which
> will affect all targets, need a bit more of concrete information.
>
> The last thing we want is to keep changing how the vectorizer behaves
> every six months without any concrete information as to why.
>
> I also understand that LLVM is great at prototyping, and that's an
> important step for companies like ARM to make sure their features work
> as reliably as they expect in the wild, but I think adding new IR
> semantics and completely refactoring core LLVM passes without a clue
> is a few steps too far.
>
> I'm not asking for a full spec. All I'm asking is for a description of
> the intended basic functionality. Addressing modes, how to extract
> information from unknown lanes, or if all reduction steps will be done
> like `saddv`. Without that information, I cannot know what is the best
> IR representation for scalable vectors or what will be the semantics
> of shufffle / extract / insert operations.
>
>
> > "complex constant" is the term used within the LangRef.  Although its
> value can be different across certain interfaces this does not need to be
> modelled within the IR and thus for all intents and purposes we can safely
> consider it to be constant.
>
> From the LangRef:
>
> "Complex constants are a (potentially recursive) combination of simple
> constants and smaller complex constants."
>
> There's nothing there saying it doesn't need to be modeled in IR.
>
>
> > "vscale" is not trying to represent the result of such speculation. It's
> purely a constant runtime vector length multiplier.  Such a value is
> required by LoopVectorize to update induction variables as describe below
> plus simple interactions like extracting the last element of a scalable
> vector.
>
> Right, I'm beginning to see what you mean...
>
> The vectorizer needs that to be a constant at compile time to make
> safety assurances.
>
> For instance: for (1..N) { a[i+3] = a[i] + i; }
>
> Has a max VF of 3. If the vectorizer is to act on that loop, it'll
> have to change "vscale" to 3. If there are no loop dependencies, then
> you leave as "vscale" but vectorizes anyway.
>
> Other assurances are done for run time constants, for instance, tail
> loops when changing
>
> for (i=0; i<N; i++)   ->    for (i=0; i<N; i+=VF)
>
> That VF is now a run-time "constant", and the vectorizer needs to see
> it as much, otherwise it can't even test for validity.
>
> So, the vectorizer will need to be taught two things:
>
> 1. "vscale" is a run time constant, and for the purpose of validity,
> can be shrunk to any value down to two. If the value is shrunk, the
> new compile time constant replaces vscale.
>
> 2. The cost model will *have* to treat "vscale" as an actual compile
> time constant. This could come from a target feature, overriden by a
> command line flag but there has to be a default, which I'd assume is
> 4, given that it's the lowest length.
>
>
>
> >     %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
> >
> > for a VF of "n*4" (remembering that vscale is the "n" in "<n x 4 x Ty>")
>
> I see what you mean.
>
> Quick question: Since you're saying "vscale" is an unknown constant,
> why not just:
>
>    %index.next = add nuw nsw i64 %index, i64 vscale
>
> All scalable operations will be tied up by the predication vector
> anyway, and you already know what the vector type size is anyway.
>
> The only worry is about providing redundant information that could go
> stale and introduce bugs.
>
> I'm assuming the vectorizer will *have* to learn about the compulsory
> predication and build those vectors, or the back-end will have to
> handle them, and it can get ugly.
>
>
> >> %const_vec = <n x 4 x i32> @llvm.sve.constant_vector(i32 %start, i32
> %step)
> >
> > This intrinsic matches the seriesvector instruction we original
> proposed.  However, on reflection we didn't like how it allowed multiple
> representations for the same constant.
>
> Can you expand how this allows multiple representations for the same
> constant?
>
> This is a series, with a start and a step, and will only be identical
> to another which has the same start and step.
>
> Just like C constants can "appear" different...
>
> const int foo = 4;
> const int bar = foo;
> const int baz = 2 + 2;
>
>
> > I know this doesn't preclude the use of an intrinsic, I just wanted to
> highlight that doing so doesn't automatically change the surrounding IR.
>
> I don't mind IR changes, I'm just trying to understand the need for it.
>
> Normally, what we did in the past for some things was to add
> intrinsics and then, if it's clear a native IR construct would be
> better, we change it.
>
> At least the intrinsic can be easily added without breaking
> compatibility with anything, and since we're in prototyping phase
> anyway, changing the IR would be the worst idea.
>
>
These last 3 paragraphs are a great summary of my position on this as well.

Thanks!

-eric

> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161126/f5daa40c/attachment.html>