[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Thu May 21 13:51:01 PDT 2020

Chris, As I have been following this thread, it looks like you could get 
close to a win by keeping VectorType with your additional changes. 
Whereas a firm position against VectorType tends to make a win in doubt. 
Your changes could sit there for some time.

The question does not appear to be that your changes are not ideal in an 
eventual sense, but that going from where we are now to that eventual 
condition needs to be done carefully, done incrementally. The solution 
appears to be how to get agreement on an incremental path. How to 
consolidate what gain you can with a first step and then seeing what can 
be done next.

Neil Nelson

On 5/21/20 2:01 PM, Chris Tetreault via llvm-dev wrote:
>
> Hi John,
>
>    I’d like to address some points in your message.
>
> > Practically speaking, this is going to break every out-of-tree 
> frontend, backend, or optimization pass that supports SIMD types.
>
> My understanding is that the policy in LLVM development is that we do 
> not let considerations for downstream and out-of-tree codebases affect 
> the pace of development. The C++ API is explicitly unstable. I 
> maintain a downstream fork of LLVM myself, so I know the pain that 
> this is causing because I get to fix all the issues in my internal 
> codebase. However, the fix for downstream codebases is very simple: 
> Just find all the places where it says VectorType, and change it to 
> say FixedVectorType.
>
> > … by having the VectorType type semantically repurposed out from 
> under them.
>
> The documented semantics of VectorType prior to my RFC were that it is 
> a generalization of all vector types. The VectorType contains an 
> ElementCount, which is a pair of (bool, unsigned). If the bool is 
> true, then the return value of getNumElements() is the minimum number 
> of vector elements. If the bool is false, then it is the actual number 
> of elements. My RFC has not changed these semantics. It will 
> eventually delete a function that has been pervasively misused 
> throughout the codebase, but the semantics remain the same. You are 
> proposing a semantic change to VectorType to have it specifically be a 
> fixed width vector.
>
> > … a particular largely-vendor-specific extension …
>
> All SIMD vectors are vendor specific extensions. Just because most of 
> the most popular architectures have them does not make this not true. 
> AArch64 and RISC-V have scalable vectors, so it is not just one 
> architecture. It is the responsibility of all developers to ensure 
> that they use the types correctly. It would be nice if the obvious 
> thing to do is the correct thing to do.
>
> > … it’s much better for code that does support both to explicitly opt 
> in by checking for and handling the more general type …
>
> This is how it will work. I am in the process of fixing up call sites 
> that make fixed width assumptions so that they use FixedVectorType.
>
>    I think that it is important to ensure that things have clear 
> sensible names, and to clean up historical baggage when the 
> opportunity presents. The advantage of an unstable API is that you are 
> not prevented from changing this sort of thing by external entities. 
> Now is the time to fix the names of the vector types; a new type of 
> vector exists, and we have the choice to change the names to reflect 
> the new reality or to accumulate some technical debt. If we wait to 
> address this issue later, there will just be more code that needs to 
> be addressed. The refactor is fairly easy right now because pretty 
> much everything is making fixed width assumptions.  The changes are 
> all fairly mechanical. If we wait until scalable vectors are well 
> supported throughout the codebase, the change will just be that much 
> harder.
>
> Thanks,
>
>    Christopher Tetreault
>
> *From:* John McCall <rjmccall at apple.com>
> *Sent:* Thursday, May 21, 2020 11:15 AM
> *To:* Chris Tetreault <ctetreau at quicinc.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of 
> VectorType in the IR
>
> On 9 Mar 2020, at 15:05, Chris Tetreault via llvm-dev wrote:
>
>     Hi,
>     I am helping with the effort to implement scalable vectors in the
>     codebase in order to add support for generating SVE code in the
>     Arm backend. I would like to propose a refactor of the Type class
>     hierarchy in order to eliminate issues related to the misuse of
>     SequentialType::getNumElements(). I would like to introduce a new
>     class FixedVectorType that inherits from SequentialType and
>     VectorType. VectorType would no longer inherit from
>     SequentialType, instead directly inheriting from Type. After this
>     change, it will be statically impossible to accidentally call
>     SequentialType::getNumElements() via a VectorType pointer.
>
> I’m sorry that I missed this thread when you posted it. I’m very much 
> in favor of changing the type hierarchy to statically distinguish 
> fixed from scalable vector types, but I think that making VectorType 
> the common base type is unnecessarily disruptive. Practically 
> speaking, this is going to break every out-of-tree frontend, backend, 
> or optimization pass that supports SIMD types. Relatively little LLVM 
> code will just naturally support scalable vector types without any 
> adjustment. Following the principle of iterative development, as well 
> as just good conservative coding practice, it’s much better for code 
> that does support both to explicitly opt in by checking for and 
> handling the more general type, rather than being implicitly 
> “volunteered” to support both by having the |VectorType|type 
> semantically repurposed out from under them.
>
> I understand the argument that |VectorType|is a better name for the 
> abstract base type, but in this case I don’t think that consideration 
> justifies the disruption for the vast majority of LLVM developers. 
> There are plenty of names you could give the abstract base type that 
> adequately express that’s a more general type, and the historical 
> baggage of |VectorType|being slightly misleadingly named if you’re 
> aware of a particular largely-vendor-specific extension does not seem 
> overbearing.
>
> John.
>
>     Background:
>     Recently, scalable vectors have been introduced into the codebase.
>     Previously, vectors have been written <n x ty> in IR, where n is a
>     fixed number of elements known at compile time, and ty is some
>     type. Scalable vectors are written <vscale x n x ty> where vscale
>     is a runtime constant value. A new function has been added to
>     VectorType (defined in llvm/IR/DerivedTypes.h), getElementCount(),
>     that returns an ElementCount, which is defined as such in
>     llvm/Support/TypeSize.h:
>     class ElementCount {
>     public:
>     unsigned Min;
>     bool Scalable;
>     ...
>     }
>     Min is the minimum number of elements in the vector (the "n" in
>     <vscale x n x ty>), and Scalable is true if the vector is scalable
>     (true for <vscale x n x ty>, false for <n x ty>) The idea is that
>     if a vector is not scalable, then Min is exactly equal to the
>     number of vector elements, but if the vector is scalable, then the
>     number of vector elements is equal to some runtime-constant
>     multiple of Min. The key takeaway here is that scalable vectors
>     and fixed length vectors need to be treated differently by the
>     compiler. For a fixed length vector, it is valid to iterate over
>     the vector elements, but this is impossible for a scalable vector.
>     Discussion:
>     The trouble is that all instances of VectorType have
>     getNumElements() inherited from SequentialType. Prior to the
>     introduction of scalable vectors, this function was guaranteed to
>     return the number of elements in a vector or array. Today, there
>     is a comment that documents the fact that this returns only the
>     minimum number of elements for scalable vectors, however there
>     exists a ton of code in the codebase that is now misusing
>     getNumElements(). Some examples:
>     Auto *V = VectorType::get(Ty, SomeOtherVec->getNumElements());
>     This code was previously perfectly fine but is incorrect for
>     scalable vectors. When scalable vectors were introduced
>     VectorType::get() was refactored to take a bool to tell if the
>     vector is scalable. This bool has a default value of false. In
>     this example, get() is returning a non-scalable vector even if
>     SomeOtherVec was scalable. This will manifest later in some
>     unrelated code as a type mismatch between a scalable and fixed
>     length vector.
>     for (unsigned I = 0; I < SomeVec->getNumElements(); ++I) { ... }
>     Previously, since there was no notion of scalable vectors, this
>     was perfectly reasonable code. However, for scalable vectors, this
>     is always a bug.
>     With vigilance in code review, and good test coverage we will
>     eventually find and squash most of these bugs. Unfortunately, code
>     review is hard, and test coverage isn't perfect. Bugs will
>     continue to slip through as long as it's easier to do the wrong thing.
>     One other factor to consider, is that there is a great deal of
>     code which deals exclusively with fixed length vectors. Any
>     backend for which there are no scalable vectors should not need to
>     care about their existence. Even in Arm, if Neon code is being
>     generated, then the vectors will never be scalable. In this code,
>     the current status quo is perfectly fine, and any code related to
>     checking if the vector is scalable is just noise.
>     Proposal:
>     In order to support users who only need fixed width vectors, and
>     to ensure that nobody can accidentally call getNumElements() on a
>     scalable vector, I am proposing the introduction of a new
>     FixedVectorType which inherits from both VectorType and
>     SequentialType. In turn, VectorType will no longer inherit from
>     SequentialType. An example of what this will look like, with some
>     misc. functions omitted for clarity:
>     class VectorType : public Type {
>     public:
>     static VectorType *get(Type *ElementType, ElementCount EC);
>
>     Type *getElementType() const;
>     ElementCount getElementCount() const;
>     bool isScalable() const;
>     };
>
>     class FixedVectorType : public VectorType, public SequentialType {
>     public:
>     static FixedVectorType *get(Type *ElementType, unsigned NumElts);
>     };
>
>     class SequentialType : public CompositeType {
>     public:
>     uint64_t getNumElements() const { return NumElements; }
>     };
>     In this proposed architecture, VectorType does not have a
>     getNumElements() function because it does not inherit from
>     SequentialType. In generic code, users will call VectorType::get()
>     to obtain a new instance of VectorType just as they always have.
>     VectorType implements the safe subset of functionality of fixed
>     and scalable vectors that is suitable for use anywhere. If the
>     user passes false to the scalable parameter of get(), they will
>     get an instance of FixedVectorType back. Users can then inspect
>     its type and cast it to FixedVectorType using the usual
>     mechanisms. In code that deals exclusively in fixed length
>     vectors, the user can call FixedVectorType::get() to directly get
>     an instance of FixedVectorType, and their code can remain largely
>     unchanged from how it was prior to the introduction of scalable
>     vectors. At this time, there exists no use case that is only valid
>     for scalable vectors, so no ScalableVectorType is being added.
>     With this change, in generic code it is now impossible to
>     accidentally call getNumElements() on a scalable vector. If a user
>     tries to pass a scalable vector to a function that expects a fixed
>     length vector, they will encounter a compilation failure at the
>     site of the bug, rather than a runtime error in some unrelated
>     code. If a user attempts to cast a scalable vector to
>     FixedVectorType, the cast will fail at the call site. This will
>     make it easier to track down all the places that are currently
>     incorrect, and will prevent future developers from introducing
>     bugs by misusing getNumElements().
>     Outstanding design choice:
>     One issue with this architecture as proposed is the fact that
>     SequentialType (by way of CompositeType) inherits from Type. This
>     introduces a diamond inheritance in FixedVectorType.
>     Unfortunately, llvm::cast uses a c-style cast internally, so we
>     cannot use virtual inheritance to resolve this issue. Thus, we
>     have a few options:
>
>     1. Break CompositeType's inheritance on Type and introduce
>     functions to convert from a Type to a CompositeType and vice
>     versa. The conversion from CompositeType is always safe because
>     all instances of CompositeType (StructType, ArrayType, and
>     FixedVectorType) are instances of Type. A CompositeType can be
>     cast to the most derived class, then back to Type. The other way
>     is not always safe, so a function will need to be added to check
>     if a Type is an instance of CompositeType. This change is not that
>     big, and I have a prototype implementation up at
>     https://reviews.llvm.org/D75486 ([SVE] Make CompositeType not
>     inherit from Type)
>     * Pros: this approach would result in minimal changes to the
>     codebase. If the llvm casts can be made to work for the conversion
>     functions, then it would touch very few files.
>     * Cons: There are those who think that CompositeType adds little
>     value and should be removed. Now would be an ideal time to do
>     this. Additionally, the conversion functions would be more
>     complicated if we left CompositeType in.
>     2. Remove CompositeType and break SequentialType's inheritance of
>     Type. Add functions to convert a SequentialType to and from Type.
>     The conversion functions would work the same as those in option 1
>     above. Currently, there exists only one class that derives
>     directly from CompositeType: StructType. The functionality of
>     CompositeType can be directly moved into StructType, and APIs that
>     use CompositeType can directly use Type and cast appropriately. We
>     feel that this would be a fairly simple change, and we have a
>     prototype implementation up at https://reviews.llvm.org/D75660
>     (Remove CompositeType class)
>     * Pros: Removing CompositeType would simplify the type hierarchy.
>     Leaving SequentialType in would simplify some code and be more
>     typesafe than having a getSequentialNumElements on Type.
>     * Cons: The value of SequentialType has also been called into
>     question. If we wanted to remove it, now would be a good time.
>     Conversion functions add complexity to the design. Introduces
>     additional casting from Type.
>     3. Remove CompositeType and SequentialType. Roll the functions
>     directly into the most derived classes. A helper function can be
>     added to Type to handle choosing from FixedVectorType and
>     ArrayType and calling getNumElements():
>     static unsigned getSequentialNumElements() {
>     assert(isSequentialType()); // This already exists and does the
>     // right thing
>     if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>     return cast<FixedVectorType>(this)->getNumElements();
>     }
>     A prototype implementation of this strategy can be found at
>     https://reviews.llvm.org/D75661 (Remove SequentialType from the
>     type heirarchy.)
>
>     * Pros: By removing the multiple inheritance completely, we
>     greatly simplify the design and eliminate the need for any
>     conversion functions. The value of CompositeType and
>     SequentialType has been called into question, and removing them
>     now might be of benefit to the codebase
>     * Cons: getSequentialNumElements() has similar issues to those
>     that we are trying to solve in the first place and potentially
>     subverts the whole design. Omitting getSequentialNumElements()
>     would add lots of code duplication. Introduces additional casting
>     from Type.
>     I believe that all three of these options are reasonable. My
>     personal preference is currently option 2. I think that option 3's
>     getSequentialNumElements() subverts the design because every Type
>     has getSequentialNumElements(), it is tempting to just call it.
>     However, the cast will fail at the call site in debug, and in
>     release it will return a garbage value rather than a value that
>     works most of the time. For option 1, the existence of
>     CompositeType complicates the conversion logic for little benefit.
>     Conclusion:
>     Thank you for your time in reviewing this RFC. Your feedback on my
>     work is greatly appreciated.
>
>     Thank you,
>     Christopher Tetreault
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/f74e55cb/attachment.html>