[llvm-dev] [RFC] Refactor class hierarchy of VectorType in the IR

Thu May 21 15:32:18 PDT 2020

On 21 May 2020, at 17:22, Chris Tetreault wrote:

> John,
>
>> This is not categorically true, no. When we make changes that require 
>> large-scale updates for downstream codebases, we do so because 
>> there’s a real expected benefit to it. For the most part, we do 
>> make some effort to keep existing source interfaces stable.
> While I’m at a loss to find a documented policy, I recall this 
> thread 
> (http://lists.llvm.org/pipermail/llvm-dev/2020-February/139207.html) 
> where this claim was made and not disputed. The expected real benefits 
> to this change are: 1) The names now match the semantics 2) It is now 
> statically impossible to accidentally get the fixed number of elements 
> from a scalable vector and 3) It forces everybody to fix their broken 
> code. If we provided stability guarantees to downstream and 
> out-of-tree codebases, then I might not agree that 3 is a benefit, but 
> my understanding is that we do not provide this guarantee.
>
>> … Probably 99% of the code using VectorType semantically requires 
>> it to be a fixed-width vector.
>
> This code is all broken already. I don’t think supporting common 
> misuse of APIs in a codebase that does not provide stability 
> guarantees is something we should be doing.
>
>> The generalization of VectorType to scalable vector types was a 
>> representational shortcut that should never have been allowed
>
> I agree. Unfortunately it happened, and our choices are to fix it or 
> accept the technical debt forever.

Perhaps this is part of the root of our disagreement.  I consider 
scalable vector types to be an experimental/unstable feature, as many 
features are when they’re first added to the compiler.  I have much 
lower standards for disrupting the early adopters of features like that; 
if scalable vectors need to be split out of `VectorType`, we should just 
do it.

Analogously, when we upstream the pointer-authentication feature, we 
will be upstreaming a rather flawed representation that I definitely 
expect us to revise after the fact.  That will be problematic for early 
adopters of LLVM’s pointer authentication support, but that’s 
totally acceptable.

John.

>
>> the burden of generalization should usually fall on the people who 
>> need to use the generalization, and otherwise we should aim to keep 
>> APIs stable when there’s no harm to it.
>
> The burden of creating the generalization should be placed on those 
> who need it, I agree. However, once the generalization is in place, 
> the burden is on everybody to use it correctly. The reason I’ve 
> undertaken this refactor is because I found myself playing 
> whack-a-mole with people adding new broken code after the fact. The 
> previous API was very easy to misuse, so I don’t blame those people.
>
>> Are you actually auditing and testing them all to work for scalable 
>> vector types, or are you just fixing the obvious compile failures?
> Everywhere that VectorType::getNumElements() is being called, we are 
> either changing the code to cast to FixedVectorType, or we are 
> updating the code to handle scalable vectors correctly. If the call 
> site does not have test coverage with scalable vectors, this test 
> coverage is being added. Even for obviously correct transformations 
> such as `VectorType::get(SomeTy, SomeVecTy->getNumElements())` -> 
> `VectorType::get(SomeTy, SomeVecTy->getElementCount())`, I have been 
> required in code review to provide test coverage. We are taking this 
> seriously.
>
>> “Vector” has a traditional and dominant meaning as a fixed-width 
>> SIMD type, and the fact that you’ve introduced a generalization 
>> doesn’t change that. Clang supports multiple kinds of pointer, but 
>> we still reserve clang::PointerType for C pointers instead of making 
>> it an abstract superclass, thus letting our sense of logic introduce 
>> a million bugs through accidental generalization throughout the 
>> compiler.
> Various languages implemented on top of LLVM have various different 
> pointer types. However, the LLVM IR language only has one pointer 
> type. The LLVM IR language has two types of vectors, and I think 
> it’s reasonable to model them as a class hierarchy in this manner. 
> I’m not familiar with the clang::PointerType situation, so I cannot 
> pass judgement on it.
>
> Thanks,
>    Christopher Tetreault
>
> From: John McCall <rjmccall at apple.com>
> Sent: Thursday, May 21, 2020 1:47 PM
> To: Chris Tetreault <ctetreau at quicinc.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: [EXT] Re: [llvm-dev] [RFC] Refactor class hierarchy of 
> VectorType in the IR
>
>
> On 21 May 2020, at 16:01, Chris Tetreault wrote:
>
> Hi John,
>
> I’d like to address some points in your message.
>
> Practically speaking, this is going to break every out-of-tree 
> frontend, backend, or optimization pass that supports SIMD types.
>
> My understanding is that the policy in LLVM development is that we do 
> not let considerations for downstream and out-of-tree codebases affect 
> the pace of development.
>
> This is not categorically true, no. When we make changes that require 
> large-scale updates for downstream codebases, we do so because 
> there’s a real expected benefit to it. For the most part, we do make 
> some effort to keep existing source interfaces stable.
>
> The C++ API is explicitly unstable. I maintain a downstream fork of 
> LLVM myself, so I know the pain that this is causing because I get to 
> fix all the issues in my internal codebase. However, the fix for 
> downstream codebases is very simple: Just find all the places where it 
> says VectorType, and change it to say FixedVectorType.
>
> … by having the VectorType type semantically repurposed out from 
> under them.
>
> The documented semantics of VectorType prior to my RFC were that it is 
> a generalization of all vector types. The VectorType contains an 
> ElementCount, which is a pair of (bool, unsigned). If the bool is 
> true, then the return value of getNumElements() is the minimum number 
> of vector elements. If the bool is false, then it is the actual number 
> of elements. My RFC has not changed these semantics. It will 
> eventually delete a function that has been pervasively misused 
> throughout the codebase, but the semantics remain the same. You are 
> proposing a semantic change to VectorType to have it specifically be a 
> fixed width vector.
>
> Probably 99% of the code using VectorType semantically requires it to 
> be a fixed-width vector. The generalization of VectorType to scalable 
> vector types was a representational shortcut that should never have 
> been allowed; it should always have used a different type. Honestly, 
> I’m not convinced your abstract base type is even going to be very 
> useful in practice vs. just having a getVectorElementType() accessor 
> that checks for both and otherwise returns null.
>
> … a particular largely-vendor-specific extension …
>
> All SIMD vectors are vendor specific extensions. Just because most of 
> the most popular architectures have them does not make this not true. 
> AArch64 and RISC-V have scalable vectors, so it is not just one 
> architecture. It is the responsibility of all developers to ensure 
> that they use the types correctly. It would be nice if the obvious 
> thing to do is the correct thing to do.
>
> I’m not saying that we shouldn’t support scalable vector types. 
> I’m saying that the burden of generalization should usually fall on 
> the people who need to use the generalization, and otherwise we should 
> aim to keep APIs stable when there’s no harm to it.
>
> … it’s much better for code that does support both to explicitly 
> opt in by checking for and handling the more general type …
>
> This is how it will work. I am in the process of fixing up call sites 
> that make fixed width assumptions so that they use FixedVectorType.
>
> Are you actually auditing and testing them all to work for scalable 
> vector types, or are you just fixing the obvious compile failures? 
> Because scalable vector types impose some major restrictions that 
> aren’t imposed on normal vectors, and the static type system isn’t 
> going to catch most of them.
>
> I think that it is important to ensure that things have clear sensible 
> names, and to clean up historical baggage when the opportunity 
> presents.
>
> “Vector” has a traditional and dominant meaning as a fixed-width 
> SIMD type, and the fact that you’ve introduced a generalization 
> doesn’t change that. Clang supports multiple kinds of pointer, but 
> we still reserve clang::PointerType for C pointers instead of making 
> it an abstract superclass, thus letting our sense of logic introduce a 
> million bugs through accidental generalization throughout the 
> compiler.
>
> You have resigned yourself to doing a lot of work in pursuit of 
> something that I really don’t think is actually an improvement.
>
> John.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/1b9bf827/attachment.html>