[PATCH] D32530: [SVE][IR] Scalable Vector IR Type
Graham Hunter via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 9 05:57:08 PDT 2019
huntergr added inline comments.
================
Comment at: docs/LangRef.rst:2749
+of size zero are not allowed. For scalable vectors, the number of
+elements is an unknown integer multiple of the number of elements.
----------------
hfinkel wrote:
> huntergr wrote:
> > hfinkel wrote:
> > > hfinkel wrote:
> > > > rengolin wrote:
> > > > > hsaito wrote:
> > > > > > hfinkel wrote:
> > > > > > > huntergr wrote:
> > > > > > > > rkruppe wrote:
> > > > > > > > > rengolin wrote:
> > > > > > > > > > hfinkel wrote:
> > > > > > > > > > > This doesn't seem strong enough. We need the unknown multiple to be the *same* for any given type (at least within a given function). We also need a relationship between vectors of different underlying types (so that zext/sext/etc. make sense). Otherwise, you can't even sensibly add them together (for example). I realize that it says something about an unknown vector length above, but we need to translate that statement into semantics that make sense for the vectors themselves.
> > > > > > > > > > It's not that simple. Both SVE and RISC-V can have vector multiplier changes in the middle of a function (via system register or similar). Neither of them *want* that to be the norm, but IIRC, RISC-V doesn't want it to change inside a function and SVE wants it to be the same for the whole program.
> > > > > > > > > >
> > > > > > > > > > I totally agree with you that leaving it open is a huge can of worms, and wanting a per-function change would probably need new annotation on functions, which if ever done, should be orthogonal to this change (or would lead us into madness).
> > > > > > > > > >
> > > > > > > > > > I second your proposal that we fix the semantics in LLVM, for now, that the "unknown width" is the same throughout the program and that the existing relationship between fixed vectors extends to scalable vectors.
> > > > > > > > > >
> > > > > > > > > > If you look at the changes in this patch series, it assumes that behaviour already, by getting new vector types of half-size with double-elements and so on.
> > > > > > > > > >
> > > > > > > > > > IFF RISC-V wants to extend the logic to be per-function, then we will need to do a much more extensive analysis on the passes, especially around inlining and function calls. I strongly propose we don't look at it right now and fix the semantics as proposed above.
> > > > > > > > > >
> > > > > > > > > > In my analysis, with that semantics, I don't see a big impact on any existing non-scalable optimisations. With vectorisation passes being run at the end of the pipeline, even for scalable code, most of the existing pipeline will still be relevant, too.
> > > > > > > > > So first of all, I agree that this patch does (and should) only implement a single "constant" (at runtime) `vscale` value. The current wording here in LangRef is ambiguous about this, it doesn't make clear at which scope the "unknown integer multiple" is fixed. It should be made clear that this factor is the same for all vector types and does not change while the program executes (and, once the `vscale` intrinsic is added, this section here should also point at it).
> > > > > > > > >
> > > > > > > > > Second, since you mentioned it I should say: the RISC-V vector extension has now changed to a point where (at least in its standard incarnation, without further extensions on top of it) there is no need for vscale to change during runtime, let alone between individual functions. All the changes to the "vector register size" now happen in such a way that it's easy to express the longer vectors in IR by just using different vector types with higher `ElementCount::Min`, e.g. `<scalable 8 x double>` instead of `<scalable 1 x double>`. So from the RVV side, there's no need any more for `vscale` to vary e.g. function-by-function.
> > > > > > > > >
> > > > > > > > > I don't know whether the SVE folks want to take a shot at it regardless, but in past discussion it sounded like the vscale-per-function model wasn't a good fit for the programming model they envisioned, so maybe they'd come up with a different solution if and when they tackle that problem.
> > > > > > > > For SVE at least, we can consider changing the vector length during execution to be undefined behaviour. The compiler is not expected to be able to handle it.
> > > > > > > >
> > > > > > > > For RVV, given the new restrictions on how vlmul is handled, I think they won't need to change the multiple at runtime either -- just increase the the minimum number of lanes. I'm hoping to discuss this with Robin at EuroLLVM, assuming time permits.
> > > > > > > >
> > > > > > > > I'll come up with some stricter wording.
> > > > > > > I definitely agree that we should not deal with changing the vscale during program execution.
> > > > > > >
> > > > > > > I think that the model is: There is an underlying vector length. vscale = round(vector length in bits / primitive size in bits). Can we specify it like that? We do also need to define what the rounding is. What does <scalable 4 x i3> do? Or is it not allowed?
> > > > > > >
> > > > > > >I definitely agree that we should not deal with changing the vscale during program execution.
> > > > > >
> > > > > > I agree that this will make things a lot simpler than allowing it to change per function or in a middle of a function.
> > > > > >
> > > > > > However, I don't quite agree that changing vscale per function is an orthogonal issue. What are we going to do when function foo() with vscale=X calls function bar() with vscale=Y using a scalable vector parameter?
> > > > > >
> > > > > > Having said that, since I don't expect the discussions to converge anytime soon if we talk about vscale changing within a compilation unit, I agree we should move forward with vscale not changing within a compilation unit (we say program execution, but compiler's visibility is always limited to compilation unit). It should be sufficient to say that if multiple compilation unit with different vscale are linked, unspecified behavior will result.
> > > > > >
> > > > > > @hfinkel, I think the model is "#elements * elementtype" fits in one or more "units of vector" and then apply vscale to it. I don't think scalable vector needs to fit one physical register of HW. Vector type legalization should kick-in. @huntergr, please correct me if my mental model is wrong.
> > > > > > However, I don't quite agree that changing vscale per function is an orthogonal issue.
> > > > >
> > > > > I didn't mean the implementation, but the discussion.
> > > > >
> > > > > I think a per-function vscale implementation will be very different from the current one, no matter which course we take now. It won't matter much if we have native or intrinsic implementation, we'll still need function attributes and teach the optimisation passes, etc.
> > > > >
> > > > > > Having said that, since I don't expect the discussions to converge anytime soon if we talk about vscale changing within a compilation unit
> > > > >
> > > > > If the scope it the compilation unit, then we'd need it to be fixed on the target string, or we won't be able to link two units together. I think even this discussion is too soon, and we should push the scope to the whole program.
> > > > >
> > > > > Any change in vscale throughout the program should be undefined, or we'd have to encode the necessary logic in the compiler, which is the biggest worry I see from the feedback. So far, the benefits of doing so are on edge cases and the actual costs are unknown (but very likely large).
> > > > >
> > > > > In my view, this is *definitely* not a subject we should raise right now and restricting the current implementation to whole-program scope is the only way we can go forward for now in any sensible way.
> > > > > I didn't mean the implementation, but the discussion.
> > > >
> > > > As I've said in previous thread, I don't believe that we can sensibly model a changing vscale without some SSA dependence, and that will require significant changes to the overall scheme.
> > > >
> > > > > restricting the current implementation to whole-program scope is the only way we can go forward for now in any sensible way.
> > > >
> > > > +1
> > > >
> > > > > I think the model is "#elements * elementtype" fits in one or more "units of vector" and then apply vscale to it. I don't think scalable vector needs to fit one physical register of HW. Vector type legalization should kick-in.
> > > >
> > > > Indeed, I believe you're correct. We need to account for this in the definition too.
> > > >
> > > >
> > > >
> > > > Indeed, I believe you're correct. We need to account for this in the definition too.
> > >
> > > Either by having a model that includes legalization, or by restricting the size of the base vector type?
> > >
> > We perform legalization for scalable vectors with the same mechanisms fixed-length vectors do (splitting for too large, promoting/extending for too small). Should this be documented in this description (it isn't for fixed vectors), or is there a better place in the docs for that explanation?
> >
> > (A side note; for 'unpacked' float vector types (e.g. <scalable 2 x float>) we do declare them as legal for SVE then generate predicates to mask off the unused lanes in AArch64 specific code. Since there are more predicated architectures being added to the codebase, perhaps this could be generalized as a new legalization mechanism for fp vector types)
> > We perform legalization for scalable vectors with the same mechanisms fixed-length vectors do (splitting for too large, promoting/extending for too small).
>
> What defines too big (what size is used for splitting)? If `<scalable 8 x double>` fits in the vector register depends on the runtime vector size, no?
>
> > Should this be documented in this description
>
> No, unless it's part of the IR-level model.
>
> What we need here is a model defined, at the IR level, that explains why:
>
> 1. I can add two <scalable 4 x float> vectors together.
>
> 2. I cannot add a <scalable 4 x float> to a <scalable 2 x float>
>
> 3. I can sext a <scalable 4 x i32> to a <scalable 4 x i64>, and this can be bitcast to a <scalable 8 x i32>.
>
> Also we should address happens to vectors with an odd number of lanes or of a non-power-of-two-sized primitive types (both of which are defined at the IR level).
> What defines too big (what size is used for splitting)? If <scalable 8 x double> fits in the vector register depends on the runtime vector size, no?
No. For SVE, the legal types are those where the minimum size is equal to 128 bits, since that's the minimum size for hardware registers (and the granularity of increments of register size). So an operation using <scalable 8 x double> values would need to be split into 4 <scalable 2 x double> operations for SVE during legalization. (I'm ignoring the predicated unpacked float forms for a moment)
I wonder if the change in syntax from <n x 8 x double> to <scalable 8 x double> makes that less obvious.
The basis of the model can be grounded in '1' being a valid value for vscale, which effectively makes the types equivalent to fixed length vectors. I'll try coming up with a description based on that.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D32530/new/
https://reviews.llvm.org/D32530
More information about the llvm-commits
mailing list