[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Thu Mar 12 09:50:17 PDT 2020

Sander,

   Thank you for your reply, allow me to address some of your points:

* Regarding the conversion functions

   We discussed it internally and our conclusion was that my CompositeType::get() and CompositeType::is() might be unpalatable to the community. We think it might be possible to specialize the casting templates such that cast(), dyn_cast(), and isa() work. Code like

  if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
                          llvm::CompositeType::get(OrigTy, false)))

... represents a more egregious case of this. But if I can get cast working, this will become

if (auto *Sty = dyn_cast<llvm::SequentialType>(OrigTy))

... which is much nicer. If I we can make this work, then the conversions will be just as safe as they ever were. Unfortunately, accomplishing this requires writing some pretty painful template code, and it's not really documented. The cast documentation calls out clang::Decl and clang::DeclContext as an example to emulate, but provides no further guidance. I suppose this might be a good exercise. Alternatively, I could add a SequentialType::get() and SequentialType::is(), and bypass the dyn_cast_or_null() call in your example. It's just a bunch of boilerplate I didn't want to do for potentially throw-away prototype code.

* Regarding just breaking FixedVectorType away from SequentialType, but leaving ArrayType a subclass

    I think this is not a good option. We will still have to rewrite all code that is generic over FixedVectorType and ArrayType, so we gain nothing, and the amount of work is likely the same, in addition to the drawbacks that you mentioned.

* Regarding using option 3 without getSequentialNumElements()

   This will result in a bunch of code that looks like this:

if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
   doSomething(ArrTy->getNumElements(), Foo);
else
   doSomething(cast<FixedVectorType>(Ty)->getNumElements(), Foo);

   I count 8 places in https://reviews.llvm.org/D75661 where we call getSequentialNumElements(). 8 isn't _that many_ places, but it's enough to be annoying. In the resulting branches, it would be doing literally the same thing; it just screams code duplication. I think I may have been a bit melodramatic about claiming it "subverts the design." Realistically, the implementation of getSequentialNumElements() never tries to cast to VectorType, only ArrayType and FixedVectorType, so it will assert or return a garbage value at runtime. It also only calls getNumElements(), so it won't work on a scalable vector. I suppose the implementation of cast uses a c-style cast, which will eventually resort to a reinterpret_cast, so it may happen that the data layout of a FixedVectorType and VectorType are such that the VectorType's ElementCount::Min and FixedVectorType::NumElements are at the same offset. I don't think we should defensively handle this situation; either we accept that UB exists, or we reject the idea of getSequentialNumElements(). However, I assume enough people develop with asserts enabled where this won't be an issue.

   My personal preference is that we keep getSequentialNumElements() if we choose to go with option 3.

Thanks,
   Christopher Tetreault

-----Original Message-----
From: Sander De Smalen <Sander.DeSmalen at arm.com>
Sent: Wednesday, March 11, 2020 2:44 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Hi Chris,

Thanks for writing this up! I strongly support the proposal to add a FixedVectorType class to distinguish that type from the more generic (possibly scalable) VectorType. By having the code-base operate on 'FixedVectorType' rather than 'VectorType', we can gradually work to upgrade the code-base to support scalable vectors. This avoids bugs and it seems right conceptually.

On these three options, my first thought was "can we start by breaking FixedVectorType away from SequentialType" (adding a separate 'getNumElements()' method to FixedVectorType), until I figured this wouldn't be that much different from D75661. SequentialType will at that point be a pointless layer on top of ArrayType, so they could be squashed. It would however leave ArrayType and StructType as two independent types under CompositeType (is this an option 4?)

I'd be in favour of removing SequentialType and CompositeType altogether. They seem little used in practice and the places where they are used seem like they can be relatively easily updated to distinguish ArrayType, StructType and FixedVectorType separately.

Even when it requires some code duplication, I prefer being more explicit in distinguishing these types (option 3), over adding conversion functions between Type and Composite/SequentialType (options 1 and 2), especially when the conversion may not always be safe. I'm concerned that requiring conversion functions makes the code less readable, like this example from D65486:

  if (auto *STy = dyn_cast_or_null<llvm::SequentialType>(
                          llvm::CompositeType::get(OrigTy, false)))

Here the use of CompositeType::get(Type*) in the context of dyn_cast_or_null<llvm::SequentialType> seems a bit obscure to me.

If we are to choose for option 3, I'd suggest removing the interface to 'Type::getSequentialNumElements()' entirely, and replacing it by explicit `Type::getFixedVectorNumElements()` and `Type::getArrayNumElements()`, thus removing any methods that mimic the old design.

Thanks,

Sander

> On 9 Mar 2020, at 19:05, Chris Tetreault via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
>                 I am helping with the effort to implement scalable vectors in the codebase in order to add support for generating SVE code in the Arm backend. I would like to propose a refactor of the Type class hierarchy in order to eliminate issues related to the misuse of SequentialType::getNumElements(). I would like to introduce a new class FixedVectorType that inherits from SequentialType and VectorType. VectorType would no longer inherit from SequentialType, instead directly inheriting from Type. After this change, it will be statically impossible to accidentally call SequentialType::getNumElements() via a VectorType pointer.
>
> Background:
>
>                 Recently, scalable vectors have been introduced into the codebase. Previously, vectors have been written <n x ty> in IR, where n is a fixed number of elements known at compile time, and ty is some type. Scalable vectors are written <vscale x n x ty> where vscale is a runtime constant value. A new function has been added to VectorType (defined in llvm/IR/DerivedTypes.h), getElementCount(), that returns an ElementCount, which is defined as such in llvm/Support/TypeSize.h:
>
>                 class ElementCount {
> public:
>   unsigned Min;
>   bool Scalable;
>   …
> }
>
>                 Min is the minimum number of elements in the vector (the “n” in <vscale x n x ty>), and Scalable is true if the vector is scalable (true for <vscale x n x ty>, false for <n x ty>) The idea is that if a vector is not scalable, then Min is exactly equal to the number of vector elements, but if the vector is scalable, then the number of vector elements is equal to some runtime-constant multiple of Min. The key takeaway here is that scalable vectors and fixed length vectors need to be treated differently by the compiler. For a fixed length vector, it is valid to iterate over the vector elements, but this is impossible for a scalable vector.
>
> Discussion:
>
> The trouble is that all instances of VectorType have getNumElements() inherited from SequentialType. Prior to the introduction of scalable vectors, this function was guaranteed to return the number of elements in a vector or array. Today, there is a comment that documents the fact that this returns only the minimum number of elements for scalable vectors, however there exists a ton of code in the codebase that is now misusing getNumElements(). Some examples:
>
>                 Auto *V = VectorType::get(Ty,
> SomeOtherVec->getNumElements());
>
>                 This code was previously perfectly fine but is incorrect for scalable vectors. When scalable vectors were introduced VectorType::get() was refactored to take a bool to tell if the vector is scalable. This bool has a default value of false. In this example, get() is returning a non-scalable vector even if SomeOtherVec was scalable. This will manifest later in some unrelated code as a type mismatch between a scalable and fixed length vector.
>
>                 for (unsigned I = 0; I < SomeVec->getNumElements();
> ++I) { … }
>
>                 Previously, since there was no notion of scalable vectors, this was perfectly reasonable code. However, for scalable vectors, this is always a bug.
>
>                 With vigilance in code review, and good test coverage we will eventually find and squash most of these bugs. Unfortunately, code review is hard, and test coverage isn’t perfect. Bugs will continue to slip through as long as it’s easier to do the wrong thing.
>
>                 One other factor to consider, is that there is a great deal of code which deals exclusively with fixed length vectors. Any backend for which there are no scalable vectors should not need to care about their existence. Even in Arm, if Neon code is being generated, then the vectors will never be scalable. In this code, the current status quo is perfectly fine, and any code related to checking if the vector is scalable is just noise.
>
> Proposal:
>
>                 In order to support users who only need fixed width vectors, and to ensure that nobody can accidentally call getNumElements() on a scalable vector, I am proposing the introduction of a new FixedVectorType which inherits from both VectorType and SequentialType. In turn, VectorType will no longer inherit fromSequentialType. An example of what this will look like, with some misc. functions omitted for clarity:
>
> class VectorType : public Type {
> public:
>   static VectorType *get(Type *ElementType, ElementCount EC);
>
>   Type *getElementType() const;
>   ElementCount getElementCount() const;
>   bool isScalable() const;
> };
>
> class FixedVectorType : public VectorType, public SequentialType {
> public:
>   static FixedVectorType *get(Type *ElementType, unsigned NumElts); };
>
> class SequentialType : public CompositeType {
> public:
>   uint64_t getNumElements() const { return NumElements; } };
>
>                 In this proposed architecture, VectorType does not have a getNumElements() function because it does not inherit from SequentialType. In generic code, users will call VectorType::get() to obtain a new instance of VectorType just as they always have. VectorType implements the safe subset of functionality of fixed and scalable vectors that is suitable for use anywhere. If the user passes false to the scalable parameter of get(), they will get an instance ofFixedVectorType back. Users can then inspect its type and cast it to FixedVectorType using the usual mechanisms. In code that deals exclusively in fixed length vectors, the user can call FixedVectorType::get() to directly get an instance of FixedVectorType, and their code can remain largely unchanged from how it was prior to the introduction of scalable vectors. At this time, there exists no use case that is only valid for scalable vectors, so no ScalableVectorType is being added.
>
>                 With this change, in generic code it is now impossible to accidentally call getNumElements() on a scalable vector. If a user tries to pass a scalable vector to a function that expects a fixed length vector, they will encounter a compilation failure at the site of the bug, rather than a runtime error in some unrelated code. If a user attempts to cast a scalable vector to FixedVectorType, the cast will fail at the call site. This will make it easier to track down all the places that are currently incorrect, and will prevent future developers from introducing bugs by misusing getNumElements().
>
> Outstanding design choice:
>
>                 One issue with this architecture as proposed is the fact that SequentialType (by way of CompositeType) inherits from Type. This introduces a diamond inheritance in FixedVectorType. Unfortunately, llvm::cast uses a c-style cast internally, so we cannot use virtual inheritance to resolve this issue. Thus, we have a few options:
>
> • Break CompositeType’s inheritance on Type and introduce functions to convert from a Type to a CompositeType and vice versa. The conversion from CompositeType is always safe because all instances of CompositeType (StructType, ArrayType, and FixedVectorType) are instances of Type. A CompositeType can be cast to the most derived class, then back to Type. The other way is not always safe, so a function will need to be added to check if aType is an instance of CompositeType. This change is not that big, and I have a prototype implementation up at https://reviews.llvm.org/D75486 ([SVE] Make CompositeType not inherit from Type)
> • Pros: this approach would result in minimal changes to the codebase. If the llvm casts can be made to work for the conversion functions, then it would touch very few files.
> • Cons: There are those who think that CompositeType adds little value and should be removed. Now would be an ideal time to do this. Additionally, the conversion functions would be more complicated if we left CompositeType in.
> • Remove CompositeType and break SequentialType’s inheritance of Type. Add functions to convert a SequentialType to and from Type. The conversion functions would work the same as those in option 1 above. Currently, there exists only one class that derives directly from CompositeType: StructType. The functionality of CompositeType can be directly moved into StructType, and APIs that use CompositeType can directly use Type and cast appropriately. We feel that this would be a fairly simple change, and we have a prototype implementation up at https://reviews.llvm.org/D75660 (Remove CompositeType class)
> • Pros: Removing CompositeType would simplify the type hierarchy. Leaving SequentialType in would simplify some code and be more typesafe than having a getSequentialNumElements on Type.
> • Cons: The value of SequentialType has also been called into question. If we wanted to remove it, now would be a good time. Conversion functions add complexity to the design. Introduces additional casting from Type.
> • Remove CompositeType and SequentialType. Roll the functions directly into the most derived classes. A helper function can be added to Type to handle choosing from FixedVectorType and ArrayType and calling getNumElements():
> static unsigned getSequentialNumElements() {
>   assert(isSequentialType()); // This already exists and does the
>                               // right thing
>   if (auto *AT = dyn_cast<ArrayType>(this))
>     return AT->getNumElements();
>   return cast<FixedVectorType>(this)->getNumElements();
> }
>
> A prototype implementation of this strategy can be found at
> https://reviews.llvm.org/D75661 (Remove SequentialType from the type
> heirarchy.)
>
> • Pros: By removing the multiple inheritance completely, we greatly simplify the design and eliminate the need for any conversion functions. The value ofCompositeType and SequentialType has been called into question, and removing them now might be of benefit to the codebase
> • Cons: getSequentialNumElements() has similar issues to those that we are trying to solve in the first place and potentially subverts the whole design. Omitting getSequentialNumElements() would add lots of code duplication. Introduces additional casting from Type.
> I believe that all three of these options are reasonable. My personal preference is currently option 2. I think that option 3’s getSequentialNumElements()subverts the design because every Type has getSequentialNumElements(), it is tempting to just call it. However, the cast will fail at the call site in debug, and in release it will return a garbage value rather than a value that works most of the time. For option 1, the existence of CompositeType complicates the conversion logic for little benefit.
>
> Conclusion:
>
>                 Thank you for your time in reviewing this RFC. Your feedback on my work is greatly appreciated.
>
>
>
> Thank you,
>
>                 Christopher Tetreault
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev