[PATCH] D135472: [ODRHash] Hash attributes on declarations.

Fri Oct 14 06:38:46 PDT 2022

aaron.ballman added a comment.

In D135472#3856596 <https://reviews.llvm.org/D135472#3856596>, @vsapsai wrote:

> Given all the discussion about which attributes should be added to ODR hash, I think it is useful at this point to have TableGen infrastructure to get this information from Attr.td. So I'll work on that.

Thank you, I think that's a good approach for the moment. Once you have the basic framework in place, we can opt in the uncontroversial attributes (like, I think anything inheriting from `TypeAttr` or `DeclOrTypeAttr` should be automatically opted in and any `Argument` whose `Fake` flag is set should not contribute to the ODR hash) and then we can do a follow-up to figure out how to distinguish "definitely an ODR violation" attributes from "not an ODR violation but still something we want to find a way to alert users about".

================
Comment at: clang/lib/AST/ODRHash.cpp:479-480
+
+  llvm::copy_if(D->attrs(), std::back_inserter(HashableAttrs),
+                [](const Attr *A) { return !A->isImplicit(); });
+}
----------------
ChuanqiXu wrote:
> aaron.ballman wrote:
> > ChuanqiXu wrote:
> > > aaron.ballman wrote:
> > > > ChuanqiXu wrote:
> > > > > vsapsai wrote:
> > > > > > vsapsai wrote:
> > > > > > > erichkeane wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > vsapsai wrote:
> > > > > > > > > > > I'm not sure `isImplicit` is the best indicator of attributes to check, so suggestions in this area are welcome. I think we can start strict and relax some of the checks if needed.
> > > > > > > > > > > 
> > > > > > > > > > > If people have strong opinions that some attributes shouldn't be ignored, we can add them to the tests to avoid regressions. Personally, I believe that alignment and packed attributes should never be silently ignored.
> > > > > > > > > > Agreed. I feel `isImplicit` is enough for now.
> > > > > > > > > The tricky part is -- sometimes certain attributes add additional implicit attributes and those implicit attributes matter (https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L9380). And some attributes seem to just do the wrong thing entirely: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L7344
> > > > > > > > > 
> > > > > > > > > So I think `isImplicit()` is a good approximation, but I'm more wondering what the principle is for whether an attribute should or should not be considered part of the ODR hash. Type attributes, attributes that impact object layout, etc all seem like they should almost certainly be part of ODR hashing. Others are a bit more questionable though.
> > > > > > > > > 
> > > > > > > > > I think this is something that may need a per-attribute flag in Attr.td so attributes can opt in or out of it because I'm not certain what ODR issues could stem from `[[maybe_unused]]` or `[[deprecated]]` disagreements across module boundaries.
> > > > > > > > I don't think 'isImplicit' is particularly good.  I think the idea of auto-adding 'type' attributes and having the 'rest' be analyzed to figure out which are important.
> > > > > > > > 
> > > > > > > > Alternatively, I wonder if we're better off just adding ALL attributes and seeing what the fallout is. We can later decide when we don't want them to be ODR significant (which, might be OTHERWISE meaningful later!).
> > > > > > > One criteria to decide which attributes should be hashed is if they affect IRGen. But that's not exhaustive and I'm not sure how practical it is.
> > > > > > > 
> > > > > > > The rule I'm trying to follow right now is if declarations with different attributes can be substituted instead of each other. For example, structs with different memory layout cannot be interchanged, so it is reasonable to reject them.
> > > > > > > 
> > > > > > > But maybe we should review attributes on case-by-case basis. For example, for `[[deprecated]]` I think the best for developers is not to complain about it but merge the attributes and then have regular diagnostic about using a deprecated entity.
> > > > > > One option was to hash `isa<InheritedAttr>` attributes as they are "sticky" and should be more significant, so shouldn't be ignored. But I don't think this argument is particularly convincing.
> > > > > > 
> > > > > > At this stage I think we can still afford to add all attributes and exclude some as-needed because modules adoption is still limited. Though I don't have strong feelings about this approach, just getting more restrictive later can be hard.
> > > > > It looks not a bad idea to add all attributes as experiments, @vsapsai how do you feel about this?
> > > > My intuition is that we're going to want this to be controlled from Attr.td on a case-by-case basis and to automatically generate the ODR hash code for attribute arguments.
> > > > 
> > > > We can either force every attribute to decide explicitly (this seems pretty disruptive as a final design but would be a really good way to ensure we audit all attributes if done as an experiment), or we can pick a default behavior to ODR hash/not. I suspect we're going to want to pick a default behavior based on whatever the audit tells us the most common situation is.
> > > > 
> > > > I think we're going to need something a bit more nuanced than "this attribute matters for ODR" because there are attribute arguments that don't always contribute to the entity (for example, we have "fake" arguments that are used to give the semantic attribute more information, there may also be cases where one argument matter and another argument does not such as `enable_if` where the condition matters greatly but the message doesn't matter much). So we might need a marking on the attribute level *and* on the parameter level to determine what all factors into attribute identity for generating the ODR hashing code. Hopefully we can avoid needing more granularity than that.
> > > I feel the default behavior would be to do ODR hashes. I just took a quick look in Attr.td. And I feel most of them affecting the code generation. For example, there're a lot of attributes which is hardware related.
> > > 
> > > > because there are attribute arguments that don't always contribute to the entity
> > > 
> > > When you're talking about "entities", I'm not sure if you're talking the "entities" in the language spec or a general abstract ones. I mean, for example, the `always_inline` attribute doesn't contribute to the declaration from the language perspective. But it'll be super odd if we lost it. So I feel it may be better to change the requirement to be "affecting code generation"
> > > 
> > > Also, to be honest, I am even not sure if we should take "affecting code generation " as the requirement. For example, for the `preferred_name` attribute, this is used for better printing names. It doesn't affect code generation nor the entity you mentioned. But if we drop it, then the printing name may not be what users want. I'm not sure if this is desired.
> > > I feel the default behavior would be to do ODR hashes. I just took a quick look in Attr.td. And I feel most of them affecting the code generation. For example, there're a lot of attributes which is hardware related.
> > 
> > Hmmm, I'm seeing a reasonably even split. We definitely have a ton of attributes like multiversioning, interrupts, calling conventions, availability, etc that I think all should be part of an ODR hash. We also have a ton of other ones that don't seem like they should matter for ODR hashing like hot/cold, consumed/returns retained, constructor/destructor, deprecated, nodiscard, maybe_unused, etc.
> > 
> > Then we have the really fun ones where I think we'll want them to be in the ODR hash but maybe we don't, like `[[clang::annotate]]`, restrict, etc.
> > 
> > So I think we really do need an actual audit of the attributes to decide what the default should be (if there should even be one).
> > 
> > > When you're talking about "entities", I'm not sure if you're talking the "entities" in the language spec or a general abstract ones. I mean, for example, the always_inline attribute doesn't contribute to the declaration from the language perspective. But it'll be super odd if we lost it. So I feel it may be better to change the requirement to be "affecting code generation"
> > 
> > Not in a language spec meaning -- I meant mostly that there's some notion of identify for the one definition rule and not all attribute arguments contribute to that identity the same way.
> > 
> > I don't know that "affecting code gen" really captures the whole of it. For example, whether a function is/is not marked hot or cold affects codegen, but really has nothing to do with the identity of the function (it's the same function whether the hint is there or not). `preferred_name` is another example -- do we really think a structure with that attribute is fundamentally a different thing than one without that attribute? I don't think so.
> > 
> > To me, it's more about what cross-TU behaviors you'll get if the attribute is only present on one side. Things like ABI tags, calling conventions, attributes that behave like qualifiers, etc all contribute to the identity of something where a mismatch will impact the correctness of the program. But things like optimization hints and diagnostic hints don't seem like they contribute to the identity of something because they don't impact correctness.
> I got your point. But from my developing experience, I want to say the optimization hints matters too. And I am wondering how about checking the ODR for all attributes. It would emit error if the attributes that affecting the correctness mismatches. And if the other attributes mismatches, a warning enough.  (If we worry about the compile time performance, I feel it won't take too long. And we can have an option to control it actually.)
> I got your point. But from my developing experience, I want to say the optimization hints matters too. 

That's why I'd like to have an idea about what our policy is. To me, ODR hashing should be restricted to finding ODR violations. It sounds like you'd like ODR hashing to also optionally find other kinds of differences that aren't necessarily ODR violations but still surprises nonetheless. I think that's reasonable, but I think we should ensure the diagnostics distinguish between "this is definitely UB" and "this might not have the optimization or diagnostic properties you expect but is still correct".

> And I am wondering how about checking the ODR for all attributes. It would emit error if the attributes that affecting the correctness mismatches. And if the other attributes mismatches, a warning enough. (If we worry about the compile time performance, I feel it won't take too long. And we can have an option to control it actually.)

I don't think we should ODR-check all attributes blindly. There's no ODR violation for one function having `[[nodiscard]]` in a TU and another function not having it. In fact, attributes are sometimes used additively so that you can put extra constraints locally that aren't there globally. e.g., I've seen real code doing:
```
// Header.h
int some_func();

// Source.cpp
#include "Header.h"

// We've had too many bugs from people ignoring the results,
// which really matters in this particular file. Redeclare the API
// with extra diagnostic checking for this TU.
[[nodiscard]] int some_func();

...
```
It shouldn't cause ODR violation diagnostics if `some_func()` is defined in another source file without the attribute.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135472/new/

https://reviews.llvm.org/D135472