[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Thu May 21 15:32:18 PDT 2020
On 21 May 2020, at 17:22, Chris Tetreault wrote:
> John,
>
>> This is not categorically true, no. When we make changes that require
>> large-scale updates for downstream codebases, we do so because
>> there’s a real expected benefit to it. For the most part, we do
>> make some effort to keep existing source interfaces stable.
> While I’m at a loss to find a documented policy, I recall this
> thread
> (http://lists.llvm.org/pipermail/llvm-dev/2020-February/139207.html)
> where this claim was made and not disputed. The expected real benefits
> to this change are: 1) The names now match the semantics 2) It is now
> statically impossible to accidentally get the fixed number of elements
> from a scalable vector and 3) It forces everybody to fix their broken
> code. If we provided stability guarantees to downstream and
> out-of-tree codebases, then I might not agree that 3 is a benefit, but
> my understanding is that we do not provide this guarantee.
>
>> … Probably 99% of the code using VectorType semantically requires
>> it to be a fixed-width vector.
>
> This code is all broken already. I don’t think supporting common
> misuse of APIs in a codebase that does not provide stability
> guarantees is something we should be doing.
>
>> The generalization of VectorType to scalable vector types was a
>> representational shortcut that should never have been allowed
>
> I agree. Unfortunately it happened, and our choices are to fix it or
> accept the technical debt forever.
Perhaps this is part of the root of our disagreement. I consider
scalable vector types to be an experimental/unstable feature, as many
features are when they’re first added to the compiler. I have much
lower standards for disrupting the early adopters of features like that;
if scalable vectors need to be split out of `VectorType`, we should just
do it.
Analogously, when we upstream the pointer-authentication feature, we
will be upstreaming a rather flawed representation that I definitely
expect us to revise after the fact. That will be problematic for early
adopters of LLVM’s pointer authentication support, but that’s
totally acceptable.
John.
>
>> the burden of generalization should usually fall on the people who
>> need to use the generalization, and otherwise we should aim to keep
>> APIs stable when there’s no harm to it.
>
> The burden of creating the generalization should be placed on those
> who need it, I agree. However, once the generalization is in place,
> the burden is on everybody to use it correctly. The reason I’ve
> undertaken this refactor is because I found myself playing
> whack-a-mole with people adding new broken code after the fact. The
> previous API was very easy to misuse, so I don’t blame those people.
>
>> Are you actually auditing and testing them all to work for scalable
>> vector types, or are you just fixing the obvious compile failures?
> Everywhere that VectorType::getNumElements() is being called, we are
> either changing the code to cast to FixedVectorType, or we are
> updating the code to handle scalable vectors correctly. If the call
> site does not have test coverage with scalable vectors, this test
> coverage is being added. Even for obviously correct transformations
> such as `VectorType::get(SomeTy, SomeVecTy->getNumElements())` ->
> `VectorType::get(SomeTy, SomeVecTy->getElementCount())`, I have been
> required in code review to provide test coverage. We are taking this
> seriously.
>
>> “Vector” has a traditional and dominant meaning as a fixed-width
>> SIMD type, and the fact that you’ve introduced a generalization
>> doesn’t change that. Clang supports multiple kinds of pointer, but
>> we still reserve clang::PointerType for C pointers instead of making
>> it an abstract superclass, thus letting our sense of logic introduce
>> a million bugs through accidental generalization throughout the
>> compiler.
> Various languages implemented on top of LLVM have various different
> pointer types. However, the LLVM IR language only has one pointer
> type. The LLVM IR language has two types of vectors, and I think
> it’s reasonable to model them as a class hierarchy in this manner.
> I’m not familiar with the clang::PointerType situation, so I cannot
> pass judgement on it.
>
> Thanks,
> Christopher Tetreault
>
> From: John McCall <rjmccall at apple.com>
> Sent: Thursday, May 21, 2020 1:47 PM
> To: Chris Tetreault <ctetreau at quicinc.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of
> VectorType in the IR
>
>
> On 21 May 2020, at 16:01, Chris Tetreault wrote:
>
> Hi John,
>
> I’d like to address some points in your message.
>
> Practically speaking, this is going to break every out-of-tree
> frontend, backend, or optimization pass that supports SIMD types.
>
> My understanding is that the policy in LLVM development is that we do
> not let considerations for downstream and out-of-tree codebases affect
> the pace of development.
>
> This is not categorically true, no. When we make changes that require
> large-scale updates for downstream codebases, we do so because
> there’s a real expected benefit to it. For the most part, we do make
> some effort to keep existing source interfaces stable.
>
> The C++ API is explicitly unstable. I maintain a downstream fork of
> LLVM myself, so I know the pain that this is causing because I get to
> fix all the issues in my internal codebase. However, the fix for
> downstream codebases is very simple: Just find all the places where it
> says VectorType, and change it to say FixedVectorType.
>
> … by having the VectorType type semantically repurposed out from
> under them.
>
> The documented semantics of VectorType prior to my RFC were that it is
> a generalization of all vector types. The VectorType contains an
> ElementCount, which is a pair of (bool, unsigned). If the bool is
> true, then the return value of getNumElements() is the minimum number
> of vector elements. If the bool is false, then it is the actual number
> of elements. My RFC has not changed these semantics. It will
> eventually delete a function that has been pervasively misused
> throughout the codebase, but the semantics remain the same. You are
> proposing a semantic change to VectorType to have it specifically be a
> fixed width vector.
>
> Probably 99% of the code using VectorType semantically requires it to
> be a fixed-width vector. The generalization of VectorType to scalable
> vector types was a representational shortcut that should never have
> been allowed; it should always have used a different type. Honestly,
> I’m not convinced your abstract base type is even going to be very
> useful in practice vs. just having a getVectorElementType() accessor
> that checks for both and otherwise returns null.
>
> … a particular largely-vendor-specific extension …
>
> All SIMD vectors are vendor specific extensions. Just because most of
> the most popular architectures have them does not make this not true.
> AArch64 and RISC-V have scalable vectors, so it is not just one
> architecture. It is the responsibility of all developers to ensure
> that they use the types correctly. It would be nice if the obvious
> thing to do is the correct thing to do.
>
> I’m not saying that we shouldn’t support scalable vector types.
> I’m saying that the burden of generalization should usually fall on
> the people who need to use the generalization, and otherwise we should
> aim to keep APIs stable when there’s no harm to it.
>
> … it’s much better for code that does support both to explicitly
> opt in by checking for and handling the more general type …
>
> This is how it will work. I am in the process of fixing up call sites
> that make fixed width assumptions so that they use FixedVectorType.
>
> Are you actually auditing and testing them all to work for scalable
> vector types, or are you just fixing the obvious compile failures?
> Because scalable vector types impose some major restrictions that
> aren’t imposed on normal vectors, and the static type system isn’t
> going to catch most of them.
>
> I think that it is important to ensure that things have clear sensible
> names, and to clean up historical baggage when the opportunity
> presents.
>
> “Vector” has a traditional and dominant meaning as a fixed-width
> SIMD type, and the fact that you’ve introduced a generalization
> doesn’t change that. Clang supports multiple kinds of pointer, but
> we still reserve clang::PointerType for C pointers instead of making
> it an abstract superclass, thus letting our sense of logic introduce a
> million bugs through accidental generalization throughout the
> compiler.
>
> You have resigned yourself to doing a lot of work in pursuit of
> something that I really don’t think is actually an improvement.
>
> John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/1b9bf827/attachment.html>
More information about the llvm-dev
mailing list