[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Fri Feb 7 17:52:15 PST 2020

> On Feb 7, 2020, at 4:42 PM, Eli Friedman <efriedma at quicinc.com> wrote:
>> In my opinion, LLVM constant expressions seem like a bad idea in general,
>> and I’d rather see them go away over time - as one example “constants that
>> can trap” often are mishandled, and pattern matching is slower and more
>> complex due to them.  In my mind, the a better representation would be
>> something that directly models what we need: global variable initializers can
>> have relocations applied to them, so they have a very specific grammar of
>> constant expression that represents this (symbol + constant, and symbol-
>> symbol+constant on macho targets).  The rest of the constant expr could be
>> dropped, simplifying the system.
> 
> I think there are some practical advantages to having constant expressions; in particular, we get automatic "CSE", which I think ends up being a practical advantage in some cases.  And there are certain issues involving SelectionDAG (in particular, cloning a constant into every basic block make pattern-matching more effective, and reduces register pressure in some cases).  And some of our cost modeling on IR sort of depends on the fact that constants are not instructions.  I mean, we don't need constants; at an extreme we could have a constantint instruction, like GlobalISel's G_CONSTANT.  But it's not clear that would be helpful for midlevel optimizations.

Sure, but that cuts both ways.  The fact that they end up as unfolded constantexprs implies that they need to be emitted, and the compiler doesn’t generally have a cost model to reflect that.

CSE doesn’t happen in general because (as you mention below) undef’s exist in shuffle masks, and the are not auto-CSE’d.

In the comments above, I was referring to LLVM IR - I agree that SelectionDAG has its own issues, and that the LLVM IR design doesn’t necessarily translate over.

> For scalable vectors in particular, currently, the only way to express a vector with where every element is a simple integer is using a splat shuffle expression, and those often fold into some instruction.  Continuing to use constant expressions for that will make that work more smoothly, I think.  That could be solved in other ways, though (more fun code for CodeGenPrepare!), so there isn't a hard requirement.

Oh, I see what you’re saying.  The issue here is that the vector length is modeled as a Constant.  Have I ever mentioned that that was always weird and non-obvious? :-)

Joking aside, there is a serious problem for getting the cost model correct.  You are (reasonably) wanting to have “something cheap” be modeled as a "Constant*”, however a shuffle as an ConstantExpr also admits the cases where an instruction needs to be dynamically computed.

This conflation is exactly the problem I have with the existing constant expressions design.  You cannot reasonably write a cost model for this in target independent code, and in some cases it helps but others it hurts.  It is hugely problematic to me that we represent things as a Constant* that actually emit code (this is why ‘constants that can trap’ exist for example).

In my opinion, you’d be much better off by making these be a separate intrinsic that has no constantexpr, and use the standard CGP (or GlobalIsel) tricks to duplicate it where necessary on your target.  The reason for this is that these will need to be executed on some targets, but can be folded on others.  Having them explicit still allows CSE and other transformations to work on them, which is important for canonicalization and also for targets where they are not folded.

> Some of this is what eventually led to the way we lower llvm.vscale: we have the intrinsic, but we have a hack in codegenprepare to convert it to a GEP constant expression.

Since you have the GEP trick that works with variable length vectors, why do you need llvm.vscale anymore at all?

>> An alternate design could look like four things (again, ignoring intrinsic vs
>> instruction):
>> 
>> “Data dependent shuffle” would allow runtime shuffle masks.
>>  -> “fixed mask shuffle” would require a statically determined shuffle mask.
>>     -> “well known target-independent shuffle” would support specific
>> shuffles like zip, even/odd splits, splat, etc.
>>         -> “target specific shuffles” would be any number of special things that
>> are shuffle like.
> 
> I'm not sure trying to form a strict hierarchy really works out.  The key aspect that breaks this is the use of "undef" in shuffle masks.  So there are actually multiple fixed shuffles of the same width which might count as a "zip_lower", depending on the context.

How does it break it?  It seems reasonable to have an undef mask even for the ‘well known’ shuffles.

> I think the distinguishing factor here that means we want to integrate these shuffles into the shufflevector instruction, vs. other sorts of shuffle-ish operations, is that the shuffle pattern is known at compile-time.  

I can understand where you’re coming from, but I’m making an argument independent of variable length vectors.  I’m arguing that the design above is better *even for static vectors* because the original ShuffleVector bet didn’t work out for them either.

> Given that, optimizations can reason about the value of various lanes, not just a black box.  

I don’t see how the above argues for black boxes.  Target specific intrinsics are often black boxes for sure, but I don’t see how "well known target-independent shuffles” would be black boxes that prevent understanding lane mappings.

> We can force any sort of representation to "work" with enough refactoring and helper methods,

Yes, agreed on that!

>> Represent even fixed length shuffles with an explicit representation seems
>> like it would make pattern matching the specific cases easier, makes the IR
>> easier to read, and provides a single canonical representation for each mask.
>> The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel) could
>> paper over the representational differences as well.
> 
> The other aspect here is the use of "undef" in fixed shuffles.  Frontends usually don't do this, but some of our optimizations for shuffles are built around replacing unused results of shuffles with "undef".  We don't want to forbid transforming "<0, 4, 1, 5>" -> "<0, undef, 1, 5>" because we decided the former was the named shuffle "zip", and we want to allow transforms/analysis for zip-like shuffles on "<0, undef, 1, 5>".  So I think it ends up being more straightforward to have a helper for "is this a zip", as opposed to a helper to transforming a "zip" to a fixed shuffle mask.

If you take a step back and look at the structure of the problem we’re working with, the existing ShuffleVector design is very problematic IMO.  Among other things, knowing that shuffle output elements are undef but not knowing that about calls and loads of vectors is pretty weird.  Why don’t we have something like the !range <http://llvm.org/docs/LangRef.html#range-metadata> metadata that applies to load/call/shuffle to indicate that the result vector elements have some unknown elements?  This would be far more general and correct.

In a world where we have two different “shuffle with a constant mask” and “shuffle with well known algorithms” operations, we would not implement undef output elements with “-1 in the shuffle mask”.  Instead, we would have a design like:

1) Shuffle masks would always have elements that range from 0 to #elements-1, and the verifier would enforce this. The accessors would return ArrayRef<unsigned> instead of ArrayRef<int>

2) We would have a unified way (like the !range attribute) to model this, and it would apply to both shuffles (plus loads and calls).

3) Because this applies to calls, instcombine could infer this property for target specific intrinsics, the result could be propagated across function results using IPA etc.

> Independent of anything we do with scalable vectors, it might make sense in the IR printer to add some sort of note for known shuffle patterns, so developers don't have to figure out the meaning of the numerical sequences in IR dumps.

Sure.  +1

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200207/72763c2a/attachment.html>