[cfe-dev] [RFC] New attribute `annotate_type`
Aaron Ballman via cfe-dev
cfe-dev at lists.llvm.org
Wed Oct 13 08:46:43 PDT 2021
On Wed, Oct 13, 2021 at 9:19 AM Martin Brænne <mboehme at google.com> wrote:
>
>
>
> On Tue, 12 Oct 2021 at 22:16, Aaron Ballman <aaron at aaronballman.com> wrote:
>>
>> On Tue, Oct 12, 2021 at 7:49 AM Martin Brænne <mboehme at google.com> wrote:
>> >
>> >
>> >
>> > On Mon, 11 Oct 2021 at 18:54, Aaron Ballman <aaron at aaronballman.com> wrote:
>> >>
>> >> On Mon, Oct 11, 2021 at 10:28 AM Martin Brænne via cfe-dev
>> >> <cfe-dev at lists.llvm.org> wrote:
>> >> >
>> >> > I would like to propose a new attribute `annotate_type` that would be analogous to the existing `annotate` attribute but for use on types. As for `annotate`, the typical use case would be to add annotations to be used by static analysis tools.
>> >> >
>> >> > I have a draft patch implementing this attribute (https://reviews.llvm.org/D111548), but before it's reviewed I wanted to solicit some feedback more broadly on whether people feel this makes sense.
>> >>
>> >> Thanks for the proposal! One question I have is whether we need an
>> >> additional attribute for this instead of reworking [[clang::annotate]]
>> >> so that it applies to either a type or a declaration. Is a separate
>> >> attribute necessary because there may be unfortunate problems with
>> >> using __attribute__((annotate)) as the spelling?
>> >
>> >
>> > Yes, that's exactly the problem. Today, the GNU spelling can be applied both before and after the type name:
>> >
>> > __attribute__((annotate("foo"))) int i1;
>> > int __attribute__((annotate("foo"))) i2;
>> >
>> > We would need to reinterpret the second variant to refer (or "appertain") to the decl-specifier-seq, not to the declaration (which is, IIUC, what the C++ standard would prescribe if this was a C++ attribute). But as it's possible that this variant is already being used "in the wild" with the intent that it should refer to the declaration, I think this isn't something we can change.
>>
>> Agreed, thanks for the confirmation that this was the reason why we
>> need a second attribute.
>>
>> > Hence, the practical solution seems to be to introduce a separate attribute for types, for which it is then unambiguous what it should apply to.
>>
>> Another possible solution would be to not support
>> __attribute__((annotate)) as a type attribute (e.g., you can use
>> [[clang::annotate]] as a declaration or a type attribute, but we would
>> restrict __attribute__((annotate)) to only ever be a declaration
>> attribute.).
>
> I guess that would be an option too, though I think this would require some changes to Clang to allow C++ attributes to be used in this position:
>
> int [[clang::annotate]] foo;
Clang already supports type attributes in that position today. e.g.,
https://godbolt.org/z/rTxoWWfdb
The position Clang doesn't currently support is on decl specifiers.
e.g., https://godbolt.org/z/rG1aP7Tz5
> See also discussion below.
>
>> > As a side note, IIUC, Clang currently more generally lumps GNU attributes that occur in these two positions together, right? I'm looking at Parser::ParseSimpleDeclaration(), specifically this line:
>> >
>> > DS.takeAttributesFrom(Attrs);
>> >
>> > For C++ attributes, I believe this would not be correct, but that's not (currently) a concern because AFAICT Clang currently doesn't allow C++ attributes to occur after a decl-specifier-seq at all (because, I presume, Clang doesn't yet support any C++ attributes that can occur in this position).
>>
>> We currently don't support any attributes that appertain to the
>> declaration specifiers (in any spelling mode).
>
>
> But one of the enumerators in TypeAttrLocation is this:
>
> /// The attribute is in the decl-specifier-seq.
> TAL_DeclSpec,
>
> And it looks as if it is used?
Ah, yeah, this is confusing stuff. decl-specifier-seq can have
attributes (http://eel.is/c++draft/dcl.spec#nt:decl-specifier-seq) and
TAL_DeclSpec tracks that. However, we do not currently support
attributes on *all kinds of decl specifiers*. e.g., [[supported #1]]
static [[not supported #2]] int [[supported #3]] i [[supported #4]];
We support the ones that are in the declaration position (#1), the
type position (#2), and the identifier position (#4) but do not
support ones on storage class specifiers, function specifiers, or
other kinds of decl specifiers other than a defining-type-specifier.
e.g., https://godbolt.org/z/8sq5Wo8sT
>> GNU attributes will
>> "slide" to the type or the declaration based on the name of the
>> attribute when doing semantic processing for them (see the
>> moveAttrFromListToList() calls in SemaType.cpp for some examples of
>> this).
>
>
> Thanks, that's an interesting pointer.
>
> I wondering now though: What do attributes attach to when the "sliding" logic doesn't apply to them (because it doesn't know them)?
Heh, now you're seeing why GNU attributes were replaced by what was
standardized in C and C++. The answer is: we don't know what to do and
this can sometimes even impact our ability to parse properly (for
example, if it's a type attribute in C, it may be the difference
between ANSI "implicit int" being allowed or not). However, in Clang,
we do not retain unknown attributes in the AST, so we parse as best we
can and then drop what we parsed onto the floor if the attribute is
unknown.
> Maybe I should clarify my use case (which will probably benefit the whole discusssion). What I'm looking to be able to do is to attach attributes to different levels of a potentially multi-level pointer. For example:
>
> int __attribute__((annotate_type("A"))) * __attribute__((annotate_type("B"))) * __attribute__((annotate_type("C"))) pp;
>
> The "A" annotation is what I mean by being able to attach an attribute to the decl specifiers (maybe my terminology isn't accurate here?).
>
> One use case for this is being able to specify which of the levels a nullability annotation refers to and not just have it always implicitly refer to the outermost pointer, as I believe the existing nullability annotations do. (See the Clang fork here for an example that does something very similar: https://github.com/sampsyo/quala)
>
> Staying with the nullability example, it's important to be able to attach an annotation to the base type itself -- imagine the `int` in the example above was instead a `unique_ptr`.
>
> This does seem to be possible to some extent with Clang today because if I feed the example above to the code in my draft change, I get `AttributedTypeLoc`s for the types `int`, `int *`, and `int **`, with annotations "A", "B", and "C" respectively.
It is possible today, but I note that we might have a bug with the
[[]] implementation: https://godbolt.org/z/vr81KzG33 -- I'd have to
dig around more to see what's going on there.
>> > This also means that if we want to use the proposed `annotate_type` attribute in this position, we have to use the GNU spelling.
>>
>> I don't think that's correct. We certainly support type attributes
>> using a C++ spelling already.
>
> I think this only works though if the attribute is applied to a declarator, but not to the decl specifiers? Here's an excerpt from the test in my draft change (https://reviews.llvm.org/D111548):
>
> int __attribute__((annotate_type("bar"))) w;
> int [[clang::annotate_type("bar")]] w2; // expected-error {{'annotate_type' attribute cannot be applied to types}}
>
> int *__attribute__((annotate_type("bar"))) x;
> int *[[clang::annotate_type("bar")]] x2;
>
> The error seems to be coming from the following chunk of code in Parser::ParseDeclarationSpecifiers(), which is specific to C++ attributes:
>
> // Reject C++11 attributes that appertain to decl specifiers as
> // we don't support any C++11 attributes that appertain to decl
> // specifiers. This also conforms to what g++ 4.8 is doing.
> ProhibitCXX11Attributes(attrs, diag::err_attribute_not_type_attr);
Oh hey, this is the possible bug I just noted above. :-D I think we
should not be rejecting if the decl specifier is a defining type
specifier because that has utility, but I'd have to go stare at the
standard a bit more to be sure. Declarator parsing is not the easiest
thing to reason about.
>> > This is an unfortunate niggle, but teaching Clang to correctly process C++ attributes in this position seems like a non-trivial undertaking because it would require us to treat GNU spellings and C++ spellings differently in Parser::ParseSimpleDeclaration(), which seems like it would be a bit of a headache.
>>
>> I don't think we'll have to go down that route, hopefully.
>>
>> >> What kind of type semantic impacts does this attribute have on the
>> >> type identity for things like overloading, name mangling, type
>> >> conversion, etc?
>> >
>> > Good point. The intent is that the attribute should have no impact. Anything else would be pretty problematic from a portability point of view because other compilers will simply ignore the attribute -- so if it had an effect on the type system in Clang, that could cause other compilers to generate different code.
>>
>> Vendor-specific attributes are inherently not portable or safe to
>> ignore.
>
>
> But for C++ attributes at least, the standard says that compilers should do exactly this (i.e. ignore unknown attributes)?
Same for C. The standard says that vendor-specific attributes are
fully implementation-defined (so they're permitted to have whatever
semantics the implementation wants, including necessary side effects).
The standard also says that if the implementation doesn't know about
an attribute, the implementation should ignore the attribute. The
combination of these rules is what makes vendor-specific attributes
inherently nonportable or safe to ignore.
As an example: https://godbolt.org/z/vsv3Wo6En
>> Generally speaking, if a type attribute has no type semantics,
>> it probably shouldn't be a type attribute. However, because this
>> attribute doesn't really have any semantics beyond "keep this
>> information associated with the type", it's a bit more complex.
>> Consider:
>>
>> void func(int i);
>> void func(int [[clang::annotate("hoo boy")]] i);
>>
>> I can see an argument that this is not a valid overload set because
>> both functions accept an integer.
>
>
> I think nullability attributes are a good analogy here (and essentially "prior art"). Here's an example:
>
> void func(int* i) {}
> void func(int* _Nonnull i) {}
>
> (https://godbolt.org/z/sP6za4WEE)
>
> Clang doesn't consider these to be overloads and instead complains about a redefinition.
I think modeling on the nullability attributes may not be a bad idea,
but note that those are surfaced to the user as keywords and not
attributes at all. I believe that was done specifically because the
nullability attributes had type semantics, but I could be remembering
incorrectly.
> I can certainly see how for different use cases you _would_ want an attribute to create a different type -- but I would say those would then require a second, separate attribute.
>
>>
>> However:
>>
>> void func(int i) [[clang::annotate("ABI=v1")]];
>> void func(int i) [[clang::annotate("ABI=v2")]];
>>
>> I can see an argument that this is a valid overload set because
>> something else (the linker, whatever) is using that annotated
>> information to provide additional semantics beyond what the compiler
>> cares about.
>
>
> As a data point that my be useful, I looked at what Clang does when I move those attributes to a position where they are declaration attributes (so that I can run the current Clang on them), i.e.:
>
> [[clang::annotate("ABI=v1")]] void func(int i) {}
> [[clang::annotate("ABI=v2")]] void func(int i) {}
>
> (https://godbolt.org/z/fo5vrso6d)
>
> Again, Clang doesn't consider these to be overloads and instead complains about a redefinition.
FWIW, that's because declaration attributes do not impact the type of
the function. We do have a declaration attribute that says "this
declaration is allowed to be overloaded" in C, but that's subtly a
different thing.
> So it seems that the existing semantics of the `annotate` attribute are similar to what I'm proposing for the `annotate_type` attribute.
>
>> >> Also, one problem here is -- [[clang::annotate]] isn't just used for
>> >> the static analyzer, it's also something that can be used by backend
>> >> tools (for example, when using attribute plugins).
>> >
>> >
>> > It's good that you bring this up because I'm actually also thinking about extending ParsedAttrInfo to handle type attributes (with a new handleTypeAttribute() function as an analog to handleDeclAttribute()). The proposed annotate_type attribute would then also be a natural candidate for an attribute that the new handleTypeAttribute() could add to types.
>>
>> Oh, neat! Unifying type/stmt/decl attributes more is definitely a great goal.
>>
>> > Is this the concern that you had here or were you thinking about something different?
>>
>> Not really, it was more along the lines of lowering to LLVM IR (below)
>> so that the backend can also take advantage of this.
>>
>> >> Do we need to
>> >> consider what information should be lowered to LLVM IR when this new
>> >> attribute appears on a type?
>> >>
>> >> Thanks!
>> >
>> > My intent was that this should have no effect on code generation.
>> >
>> > Lowering annotations on declarations to LLVM IR has some obvious use cases, but for annotations on types it's less obvious how it would be useful to lower these to IR (and also less obvious to me how one would represent these). (Did you have anything specific in mind here?)
>>
>> LLVM currently has the ability to describe some types (e.g.,
>> `%struct.X = type { i32 }`), so I was envisioning attaching this
>> attribute to the type (either as an LLVM attribute or metadata, I have
>> no idea what the distinction really is between them). LLVM currently
>> supports plugins as does Clang, and Clang's support for attributes in
>> plugins still strongly encourages users to use existing attributes to
>> surface functionality, so I can imagine users wanting to use the
>> annotate type attribute in the same way they already use the annotate
>> declaration attribute for this sort of collusion. However, I don't
>> think the codegen needs to happen up front -- mostly trying to
>> understand the shape of the design still.
>
>
> Makes sense.
>
> Sorry that this has become such a long email, but I hope I've been able to give you a clearer picture of what I'm trying to achieve.
No worries, I appreciate the detailed discussion of the design!
I think we should double-check that I'm correct about the bug with
Clang accepting:
static int [[]] i;
but rejecting:
int [[]] i;
If we're agreed that this is a bug, then it seems like once we fix
that bug, you could support [[clang::annotate]] in type positions
(with no type semantics) as well as declarations, but only if you were
willing to define __attribute__((annotate)) as applying to
declarations when there's an ambiguity between type vs decl. Would
that be a palatable approach for you?
~Aaron
>
> Cheers,
>
> Martin
>
>>
>>
>> ~Aaron
>>
>> > This could actually make another case for why we have two different attributes, because that would make the distinction clearer that we're doing code generation for one but not the other.
>> >
>> > Thanks,
>> >
>> > Martin
>> >
>> >>
>> >> ~Aaron
>> >>
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Martin
>> >> >
>> >> > --
>> >> >
>> >> > Martin Brænne
>> >> >
>> >> > Software Engineer
>> >> >
>> >> > mboehme at google.com
>> >> >
>> >> >
>> >> > Google Germany GmbH
>> >> > Erika-Mann-Straße 33
>> >> > 80363 München
>> >> >
>> >> > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> >> >
>> >> > Registergericht und -nummer: Hamburg, HRB 86891
>> >> >
>> >> > Sitz der Gesellschaft: Hamburg
>> >> >
>> >> >
>> >> > Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
>> >> >
>> >> >
>> >> >
>> >> > This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
>> >> >
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > cfe-dev at lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >
>> >
>> >
>> > --
>> >
>> > Martin Brænne
>> >
>> > Software Engineer
>> >
>> > mboehme at google.com
>> >
>> >
>> > Google Germany GmbH
>> > Erika-Mann-Straße 33
>> > 80363 München
>> >
>> > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> >
>> > Registergericht und -nummer: Hamburg, HRB 86891
>> >
>> > Sitz der Gesellschaft: Hamburg
>> >
>> >
>> > Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
>> >
>> >
>> >
>> > This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
>
>
>
> --
>
> Martin Brænne
>
> Software Engineer
>
> mboehme at google.com
>
>
> Google Germany GmbH
> Erika-Mann-Straße 33
> 80363 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>
> Registergericht und -nummer: Hamburg, HRB 86891
>
> Sitz der Gesellschaft: Hamburg
>
>
> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
>
>
>
> This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.
More information about the cfe-dev
mailing list