[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sun Nov 27 07:32:41 PST 2016

> I'm not asking for a full spec. All I'm asking is for a description of
> the intended basic functionality. Addressing modes, how to extract
> information from unknown lanes, or if all reduction steps will be done
> like `saddv`. Without that information, I cannot know what is the best
> IR representation for scalable vectors or what will be the semantics
> of shufffle / extract / insert operations.
>

If you want to know more, our dev meeting talk and slides will
hopefully be available soon. If there'll be a significant delay we can
publish the slides ourselves for you to look at, those should be
sufficient for you to understand enough of the details to form an
opinion. We also have a white paper on general SVE and vector-length
agnostic programming available here:
http://developer.arm.com/hpc/a-sneak-peek-into-sve-and-vla-programming

Thanks,
Amara

On 26 November 2016 at 17:07, Renato Golin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> On 26 November 2016 at 11:49, Paul Walker <Paul.Walker at arm.com> wrote:
>> Related to this I want to push this and related conversations in a different direction.  From the outset our approach to add SVE support to LLVM IR has been about solving the generic problem of vectorising for an unknown vector length and then extending this to support predication.  With this in mind I would rather the problem and its solution be discussed at the IR's level of abstraction rather than getting into the guts of SVE.
>
> Hi Paul,
>
> How scalable vectors operate is intimately related to how you
> represent them in IR. It took a long time for the vector types to be
> mapped to all available semantics. We still had to use a bunch of
> intrinsics for scatter / gather, it took years to get the strided
> access settled.
>
> I understand that scalable vectors are orthogonal to all this, but as
> a new concept, one that isn't available in any open source compiler I
> know of, is one that will likely be very vague. Not publishing the
> specs only make it worse.
>
> I take the example of the ACLE and ARMv8.2 patches that ARM has been
> pushing upstream. I have no idea what the new additions are, so I have
> to take your word that they're correct. But later on, different
> behaviour comes along for the same features with a comment "it didn't
> work that way, let's try this". Sometimes, I don't even know what
> failed, or why this new thing is better.
>
> When that behaviour is constricted to the ARM back-end, it's ok. It's
> a burden that me and Tim will have to carry, and so far, it has been a
> small burden. But exposing the guts of the vectorizers (which are
> already getting to a point where the need large refactorings), which
> will affect all targets, need a bit more of concrete information.
>
> The last thing we want is to keep changing how the vectorizer behaves
> every six months without any concrete information as to why.
>
> I also understand that LLVM is great at prototyping, and that's an
> important step for companies like ARM to make sure their features work
> as reliably as they expect in the wild, but I think adding new IR
> semantics and completely refactoring core LLVM passes without a clue
> is a few steps too far.
>
> I'm not asking for a full spec. All I'm asking is for a description of
> the intended basic functionality. Addressing modes, how to extract
> information from unknown lanes, or if all reduction steps will be done
> like `saddv`. Without that information, I cannot know what is the best
> IR representation for scalable vectors or what will be the semantics
> of shufffle / extract / insert operations.
>
>
>> "complex constant" is the term used within the LangRef.  Although its value can be different across certain interfaces this does not need to be modelled within the IR and thus for all intents and purposes we can safely consider it to be constant.
>
> From the LangRef:
>
> "Complex constants are a (potentially recursive) combination of simple
> constants and smaller complex constants."
>
> There's nothing there saying it doesn't need to be modeled in IR.
>
>
>> "vscale" is not trying to represent the result of such speculation. It's purely a constant runtime vector length multiplier.  Such a value is required by LoopVectorize to update induction variables as describe below plus simple interactions like extracting the last element of a scalable vector.
>
> Right, I'm beginning to see what you mean...
>
> The vectorizer needs that to be a constant at compile time to make
> safety assurances.
>
> For instance: for (1..N) { a[i+3] = a[i] + i; }
>
> Has a max VF of 3. If the vectorizer is to act on that loop, it'll
> have to change "vscale" to 3. If there are no loop dependencies, then
> you leave as "vscale" but vectorizes anyway.
>
> Other assurances are done for run time constants, for instance, tail
> loops when changing
>
> for (i=0; i<N; i++)   ->    for (i=0; i<N; i+=VF)
>
> That VF is now a run-time "constant", and the vectorizer needs to see
> it as much, otherwise it can't even test for validity.
>
> So, the vectorizer will need to be taught two things:
>
> 1. "vscale" is a run time constant, and for the purpose of validity,
> can be shrunk to any value down to two. If the value is shrunk, the
> new compile time constant replaces vscale.
>
> 2. The cost model will *have* to treat "vscale" as an actual compile
> time constant. This could come from a target feature, overriden by a
> command line flag but there has to be a default, which I'd assume is
> 4, given that it's the lowest length.
>
>
>
>>     %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4)
>>
>> for a VF of "n*4" (remembering that vscale is the "n" in "<n x 4 x Ty>")
>
> I see what you mean.
>
> Quick question: Since you're saying "vscale" is an unknown constant,
> why not just:
>
>    %index.next = add nuw nsw i64 %index, i64 vscale
>
> All scalable operations will be tied up by the predication vector
> anyway, and you already know what the vector type size is anyway.
>
> The only worry is about providing redundant information that could go
> stale and introduce bugs.
>
> I'm assuming the vectorizer will *have* to learn about the compulsory
> predication and build those vectors, or the back-end will have to
> handle them, and it can get ugly.
>
>
>>> %const_vec = <n x 4 x i32> @llvm.sve.constant_vector(i32 %start, i32 %step)
>>
>> This intrinsic matches the seriesvector instruction we original proposed.  However, on reflection we didn't like how it allowed multiple representations for the same constant.
>
> Can you expand how this allows multiple representations for the same constant?
>
> This is a series, with a start and a step, and will only be identical
> to another which has the same start and step.
>
> Just like C constants can "appear" different...
>
> const int foo = 4;
> const int bar = foo;
> const int baz = 2 + 2;
>
>
>> I know this doesn't preclude the use of an intrinsic, I just wanted to highlight that doing so doesn't automatically change the surrounding IR.
>
> I don't mind IR changes, I'm just trying to understand the need for it.
>
> Normally, what we did in the past for some things was to add
> intrinsics and then, if it's clear a native IR construct would be
> better, we change it.
>
> At least the intrinsic can be easily added without breaking
> compatibility with anything, and since we're in prototyping phase
> anyway, changing the IR would be the worst idea.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev