[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Tue Jul 31 11:21:52 PDT 2018

Hi David,

Let me put the last two comments up:

> > But we're trying to represent slightly different techniques
> > (predication, vscale change) which need to be tied down to only
> > exactly what they do.
>
> Wouldn't intrinsics to change vscale do exactly that?

You're right. I've been using the same overloaded term and this is
probably what caused the confusion.

In some cases, predicating and shortening the vectors are semantically
equivalent. In this case, the IR should also be equivalent.
Instructions/intrinsics that handle predication could be used by the
backend to simply change VL instead, as long as it's guaranteed that
the semantics are identical. There are no problems here.

In other cases, for example widening or splitting the vector, or cases
we haven't thought of yet, the semantics are not the same, and having
them in IR would be bad. I think we're all in agreements on that.

All I'm asking is that we make a list of what we want to happen and
disallow everything else explicitly, until someone comes with a strong
case for it. Makes sense?

> I'm all for being explicit.  I think we're basically on the same page,
> though there are a few things noted above where I need a little more
> clarity.

Yup, I think we are. :)

> What does "mid-loop" mean?  On traditional vector architectures it was
> very common to change VL for the last loop iteration.  Otherwise you had
> to have a remainder loop.  It was much better to change VL.

You got it below...

> Ok, I think I am starting to grasp what you are saying.  If a value
> flows from memory or some scalar computation to vector and then back to
> memory or scalar, VL should only ever be set at the start of the vector
> computation until it finishes and the value is deposited in memory or
> otherwise extracted.  I think this is ok, but note that any vector
> functions called may change VL for the duration of the call.  The change
> would not be visible to the caller.

If a function is called and changes the length, does it restore back on return?

> I am not so sure about that.  Power requirements may very well drive
> more dynamic vector lengths.  Even today some AVX 512 implementations
> falter if there are "too many" 512-bit operations.  Scaling back SIMD
> width statically is very common today and doing so dynamically seems
> like an obvious extension.  I don't know of any efforts to do this so
> it's all speculative at this point.  But the industry has done it in the
> past and we have a curious pattern of reinventing things we did before.

Right, so it's not as clear cut as I hoped. But we can start
implementing the basic idea and then expand as we go. I think trying
to hash out all potential scenarios now will drive us crazy.

> It seems strange to me for an optimizer to operate in such a way.  The
> optimizer should be fully aware of the target's capabilities and use
> them accordingly.

Mid-end optimisers tend to be fairly agnostic. And when not, they
usually ask "is this supported" instead of "which one is better".

> ARM seems to have no difficulty selecting instructions for it.  Changing
> the value of vscale shouldn't impact ISel at all.  The same instructions
> are selected.

I may very well be getting lost in too many floating future ideas, atm. :)

> > It is, but IIGIR, changing vscale and predicating are similar
> > transformations to achieve the similar goals, but will not be
> > represented the same way in IR.
>
> They probably will not be represented the same way, though I think they
> could be (but probably shouldn't be).

Maybe in the simple cases (like last iteration) they should be?

> Ok, but would be optimizer be prevented from introducing VL changes?

In the case where they're represented in similar ways in IR, it
wouldn't need to.

Otherwise, we'd have to teach the two methods to IR optimisers that
are virtually identical in semantics. It'd be left for the back end to
implement the last iteration notation as a predicate fill or a vscale
change.

> Being conservative is fine, but we should have a clear understanding of
> exactly what that means.  I would not want to prohibit all VL changes
> now and forever, because I see that as unnecessarily restrictive and
> possibly damaging to supporting future architectures.
>
> If we don't want to provide intrinsics for changing VL right now, I'm
> all in favor.  There would be no reason to add error checks because
> there would be no way within the IR to change VL.

Right, I think we're converging.

How about we don't forbid changes in vscale, but we find a common
notation for all the cases where predicating and changing vscale would
be semantically identical, and implement those in the same way.

Later on, if there are additional cases where changes in vscale would
be beneficial, we can discuss them independently.

Makes sense?

-- 
cheers,
--renato