[PATCH] D135557: Add needsImplicitDefaultConstructor and friends
Aaron Ballman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Oct 20 08:29:17 PDT 2022
aaron.ballman added inline comments.
================
Comment at: clang/bindings/python/clang/cindex.py:1530
+
+ def record_needs_implicit_default_constructor(self):
+ """Returns True if the cursor refers to a C++ record declaration
----------------
dblaikie wrote:
> aaron.ballman wrote:
> > anderslanglands wrote:
> > > dblaikie wrote:
> > > > aaron.ballman wrote:
> > > > > dblaikie wrote:
> > > > > > anderslanglands wrote:
> > > > > > > dblaikie wrote:
> > > > > > > > royjacobson wrote:
> > > > > > > > > anderslanglands wrote:
> > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > anderslanglands wrote:
> > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > > royjacobson wrote:
> > > > > > > > > > > > > > > > > anderslanglands wrote:
> > > > > > > > > > > > > > > > > > royjacobson wrote:
> > > > > > > > > > > > > > > > > > > anderslanglands wrote:
> > > > > > > > > > > > > > > > > > > > royjacobson wrote:
> > > > > > > > > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > > > > > > > > > I don't think we should expose any of the "needs" functions like this -- those are internal implementation details of the class and I don't think we want to calcify that into something we have to support forever. As we add members to a class, we recalculate whether the added member causes us to delete defaulted special members (among other things), and the "needs" functions are basically used when the class is completed to handle lazily created special members. I'm pretty sure that lazy creation is not mandated by the standard, which is why I think the "needs" functions are more of an implementation detail.
> > > > > > > > > > > > > > > > > > > > > > CC @erichkeane and @royjacobson as folks who have been in this same area of the compiler to see if they agree or disagree with my assessment there.
> > > > > > > > > > > > > > > > > > > > > I think so. The 'needs_*' functions query `DeclaredSpecialMembers` and I'm pretty sure it's modified when we add the implicit definitions in the class completion code. So this looks a bit suspicious. Is this API //meant// to be used with incomplete classes?
> > > > > > > > > > > > > > > > > > > > > For complete classes I think looking up the default/move/copy constructor and calling `isImplicit()` is the way to do it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > About the 'is deleted' API - can't the same be done for those functions as well so we have a smaller API?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > If this //is// meant to be used with incomplete classes for efficiency that would be another thing, I guess.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > So the intended use case here is I'm using libclang to parse an existing C++ libray's headers and generate a C interface to it. To do that I need to know if I need to generate default constructors etc, which the needs* methods do for me (I believe). The alternative is I have to check manually whether all the constructors/assignment operators exist, then implement the implicit declaration rules myself correctly for each version of the standard, which I'd rather avoid.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Would putting a note in the doc comment about the behaviour differing when the class is being constructed as originally suggested work for everyone?
> > > > > > > > > > > > > > > > > > > Why is the `__is_default_constructible` builtin type trait not enough? Do you have different behavior for user provided and implicit default constructors?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can I evaluate that from libclang somewhow? I can't modify the C++ libraries I'm wrapping.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Basically, given:
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > struct Foo { /* ... */ };
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I want to generate:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > typedef struct Foo_t;
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Foo_t* Foo_ctor();
> > > > > > > > > > > > > > > > > > Foo_t* Foo_copy_ctor(Foo_t*);
> > > > > > > > > > > > > > > > > > /* etc... */
> > > > > > > > > > > > > > > > > > Foo_dtor(Foo_t*);
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In order to know which ones to generate for an arbitrary struct that may or may not have any combination of ctor/assignments defined, I need to know which ones exist and follow the implicit generation rules for the ones that don't. I can do this myself with a whole bunch of version-dependent logic, but I'd rather just rely on libclang since it already knows all this much better than I do.
> > > > > > > > > > > > > > > > > I looked a bit, and it seems they aren't, and that generally libclang doesn't really know about Sema, so exporting the type traits is not that easy :/
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm not sure what's the best way forward here, but I don't like the idea of exporting those half baked internal API calls when there are actual standardized and implemented type traits that perform the same goal.
> > > > > > > > > > > > > > > > CCing folks who may have more historical memory of the C APIs and whether they're expected to operate on a completed AST or are expected to work on an AST as it is under construction. My unverified belief is that these APIs are expected to work on a completed AST.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @echristo @dblaikie @rjmccall @rsmith
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm also not certain of what the best path forward is here. I'm not comfortable exposing the needs* functions because they really are implementation details and I don't want to promise we'll support that API forever. But at the same time, the use case is reasonably compelling on the assumption you need to inspect the AST nodes as they're still under construction instead of inspecting them once the AST is completed. If the AST is fully constructed, then we should have already added the AST nodes for any special member functions that needed to be generated implicitly, so as Roy mentioned, you should be able to find the special member function you're after and check `isImplicit()` on it.
> > > > > > > > > > > > > > > Not sure I'm quite following - it doesn't look (admittedly, sorry, at a somewhat superficial look at the discussion here) like this is necessarily about incomplete AST - could parse the header and stop. That's a complete AST, yeah? And then it might be OK/reasonable to ask "could this type be default constructed" (even if the implicit ctor has been implicitly instantiated/there was no use in the source code that's been parsed)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is that correct?
> > > > > > > > > > > > > > I am just parsing the headers of a library using `clang_parseTranslationUnit()` then using `clang_visitChildren()` to inspect the AST. Doing this I do NOT see any implicitly generated methods, hence why I need these functions. It sounds like you expect those methods to be in the AST already? Which suggests I'm doing something wrong in my parsing (missing another function call/option or something)?
> > > > > > > > > > > > > Ah, no, I wouldn't expect them to be in the AST unless there was code that used them in the header.
> > > > > > > > > > > > >
> > > > > > > > > > > > > But I'm saying/trying to say is this isn't a "AST nodes still under construction" that @aaron.ballman is describing, so far as I can see - you can completely parse the header, have a complete AST then reasonably want to ask "could I default construct an object like this" - even if the implicit default ctor hasn't been instantiated because none of the parsed code asked that question.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Not sure what the API to do this should look like, but it seems like a pretty reasonable use case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Not sure about whether they cross some threshold where they're too complex/nuanced to go in the C API or not - maybe that's a question.
> > > > > > > > > > > > >> I am just parsing the headers of a library using clang_parseTranslationUnit() then using clang_visitChildren() to inspect the AST. Doing this I do NOT see any implicitly generated methods, hence why I need these functions. It sounds like you expect those methods to be in the AST already? Which suggests I'm doing something wrong in my parsing (missing another function call/option or something)?
> > > > > > > > > > > > > Ah, no, I wouldn't expect them to be in the AST unless there was code that used them in the header.
> > > > > > > > > > > >
> > > > > > > > > > > > That's part of the "this is an implementation detail" I was talking about. *Today* we don't generate the AST nodes for those functions unless we have to. Nothing says we won't find a reason we need to always generate those AST nodes, which makes the `needs*` functions useless. I suppose in that situation, the breakage for the C APIs is mostly that the exposed `needs*` functions start trivially returning `false` though, so maybe it's not as bad as it could be...
> > > > > > > > > > > >
> > > > > > > > > > > > > But I'm saying/trying to say is this isn't a "AST nodes still under construction" that @aaron.ballman is describing, so far as I can see - you can completely parse the header, have a complete AST then reasonably want to ask "could I default construct an object like this" - even if the implicit default ctor hasn't been instantiated because none of the parsed code asked that question.
> > > > > > > > > > > >
> > > > > > > > > > > > Yeah, the situation I mentioned earlier was the validity of the calls when the class has not been fully constructed in the AST yet. That's not the case here, which is great.
> > > > > > > > > > > >
> > > > > > > > > > > > > Not sure what the API to do this should look like, but it seems like a pretty reasonable use case.
> > > > > > > > > > > >
> > > > > > > > > > > > Agreed that the use case is reasonable.
> > > > > > > > > > > >
> > > > > > > > > > > > > Not sure about whether they cross some threshold where they're too complex/nuanced to go in the C API or not - maybe that's a question.
> > > > > > > > > > > >
> > > > > > > > > > > > Mostly, I think we try to expose APIs that we think we can support long-term based on what needs folks have. Given that there's a need here, and the use case seems reasonable, it seems to be something we should consider supporting.
> > > > > > > > > > > >
> > > > > > > > > > > > I suppose there's another way we could view this need though -- some folks need those special member functions even if Clang doesn't think they're necessary to generate. Not only is this use case one such time, but running AST matchers over the AST (like in clang-query or clang-tidy) may also have a similar expectation of finding all the special members. So maybe what we need is some flag to tell Clang "force the generation of those special member functions" so that we don't have to expose a `needs` function for them (which helps for the C API users but doesn't help folks like consumers of AST matchers). (Note, I don't yet know how good or bad of an idea this is.)
> > > > > > > > > > > Yeah - if someone is interested in doing the work, I'd be curious how some equivalent operations work in the AST matchers? I'd assume there's some way to query if something is copy constructible - and maybe that's more likely to be the query the user wants, rather than the "needs" operations?
> > > > > > > > > > >
> > > > > > > > > > > (like, if we did add the implicit copy constructors into the AST proactively, I don't think I'd want these queries to return "false" - I think likely the intended query is "is this thing copy constructible" (or similar) less about whether the operation is or isn't present in the AST)
> > > > > > > > > > In my case it's "do I need to generate a copy ctor for this type?". @aaron.ballman 's suggestion of a way to force the implicits to be generated in the AST would work just fine for me.
> > > > > > > > > > That's part of the "this is an implementation detail" I was talking about. *Today* we don't generate the AST nodes for those functions unless we have to. Nothing says we won't find a reason we need to always generate those AST nodes, which makes the `needs*` functions useless. I suppose in that situation, the breakage for the C APIs is mostly that the exposed `needs*` functions start trivially returning `false` though, so maybe it's not as bad as it could be...
> > > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > >
> > > > > > > > > > Yeah, the situation I mentioned earlier was the validity of the calls when the class has not been fully constructed in the AST yet. That's not the case here, which is great.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is not even about future proofing - this is already bad API. Simply adding
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > void f() {
> > > > > > > > > Test t;
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > in the test in this PR is changing the printed line from
> > > > > > > > > `ClassDecl=Test:3:7 (Definition) (needs ctor) (needs cctor) (needs mctor) (needs cassign) (needs massign) (needs dtor) Extent=[3:1 - 17:2]`
> > > > > > > > > to
> > > > > > > > > `ClassDecl=Test:3:7 (Definition) (needs cassign) (needs massign) (needs dtor) Extent=[3:1 - 17:2]`
> > > > > > > > >
> > > > > > > > > I don't think making functions in libclang conditional on whether somewhere in the headers types are actually used or not is likely to provide value. It's impossible to enforce non-use of a type if it's definition is available and it's very unnatural to C++ to rely on it.
> > > > > > > > >
> > > > > > > > > I'm now also pessimistic about the possibility of implementing **correct** versions of those `std::is...` type traits without Sema. Default constructors might be template functions that are SFINAE-disabled, for example. This isn't very exotic - the default constructors of pair, optional, etc.. are all implemented like this. The other type traits that we'd want to expose are also pretty similar.
> > > > > > > > >
> > > > > > > > > A solution might be useful even if it doesn't handle all cases correctly, of course. But IMHO in this case an approach with only AST would be too partial to justify its shortcomings.
> > > > > > > > >
> > > > > > > > > In my case it's "do I need to generate a copy ctor for this type?". @aaron.ballman 's suggestion of a way to force the implicits to be generated in the AST would work just fine for me.
> > > > > > > >
> > > > > > > > But by "generate" you mean "generate a wrapper for this operation", yeah?
> > > > > > > >
> > > > > > > > If you could query the type for "is this type copy constructible", "is this type copy assignable", etc, be adequate for your needs?
> > > > > > > > > In my case it's "do I need to generate a copy ctor for this type?". @aaron.ballman 's suggestion of a way to force the implicits to be generated in the AST would work just fine for me.
> > > > > > > >
> > > > > > > > But by "generate" you mean "generate a wrapper for this operation", yeah?
> > > > > > > >
> > > > > > > > If you could query the type for "is this type copy constructible", "is this type copy assignable", etc, be adequate for your needs?
> > > > > > >
> > > > > > > Thinking about it some more, yes I think it probably would. I would have to do some minor book-keeping to track whether there was one already declared on the class or not, but that's a lot simpler than reimplementing the implicit rules. I guess I would need `isDefaultConstructible`, `isCopyConstructible`, `isMoveConstructible`, `isCopyAssignable`, and `isMoveAssignable`
> > > > > > @royjacobson it's also going to be pretty problematic to instantiate those templates as members too.
> > > > > >
> > > > > > The AST is pretty accurately reflective of the compiler's understanding of the type at the time - adding extra instantiations isn't necessarily an improvement in fidelity. I'd argue it's a loss in fidelity because these things weren't instantiated by the original code.
> > > > > >
> > > > > > (types won't be entirely consistent/stable - member function templates are a great example - you can add code that'll cause new instantiations, and there's no way to fully enumerate all possible instantiations. So it's not a goal for a type description to be stable regardless of the code that uses the type)
> > > > > To my thinking, there's a difference between "is this copy constructible" and "does this have a copy constructor (even implicitly)". My understanding of what @anderslanglands is trying to do is to find classes that have a special member function so that the wrapper can expose the same functionality. That's not "is this copy constructible", that's "does this have a copy constructor (even implicitly)."
> > > > >
> > > > > The C++ standard specifies when classes get implicit special member functions and our AST does not reflect that accurately unless the class is being used: https://godbolt.org/z/13h3T3dPq. Both have definition data that accurately reflects whether the class could get those special members (and that definition data is an internal implementation detail), but only one of those classes have AST nodes for the special members, which means trying to run an AST matcher query for every class with a constructor gives unexpected results: https://godbolt.org/z/PhMY57anM
> > > > Perhaps a simple test: @anderslanglands, if the class had a ctor template that could be used for copying the type - would that meet your requirements? Would you want to write a wrapper that invoked/instantiated that template to perform a copy? Or would you want your wrapper not to expose copy construction?
> > > >
> > > > If it does, then the property of interest is "is this type copy constructible" and not "does this type have a copy constructor", I think?
> > > >
> > > > All that said, I don't outright object to the implicit special members being eagerly generated if it doesn't cost much compile time/memory usage/etc, but I'm certainly a little hesitant. Totally your call, @aaron.ballman
> > > @dblaikie what do you mean by "ctor template" in this context?
> > >
> > > note: for "copy ctor" here and in what follows, read "all the potentially implicitly defined functions"
> > >
> > > What I'm trying to do is this. I'm essentially transpiling C++ libraries to C (which are then wrapped in Rust, but it could in theory be any other language). I do this by iterating over the AST and extracting all the methods (and all methods from bases), then transforming them into C equivalents. Thus if a class I'm interested in defines a copy ctor itself, my regular AST extraction will generate a C version of that and all is right with the world.
> > >
> > > The tricky bit comes when a class *doesnt* define a copy ctor. I then have to figure out if it's not defined because:
> > > 1) The library author didn't bother and is relying on it being implicitly defined
> > > or
> > > 2) The class *cannot* have a copy ctor because of the implicit rules
> > >
> > > In the case of 1) I currently then need to detect this and generate the C version of the copy ctor which calls the (implicitly defined) C++ copy ctor (and of course I'm generating the library that calls this *after* I've extracted the AST, which means I can't rely on this call to define that ctor in the AST).
> > >
> > > In the case of 2) I need to detect that and then do nothing (and possibly some other stuff later on if I decide to represent C++ objects as bags of bytes that I pass around but that's another story).
> > >
> > > So there a few ways of achieving what I need:
> > >
> > > a) I just implement the implicit rules manually, detect what is and isn't there and decide whether to create the copy ctor myself. This is what I did initially, but of course I screwed it up first time which led to me thinking "surely there's a better way".
> > >
> > > b) libclang exposes the `needs*` functions in this patch. I just call those during extraction to decide what I do and do not need to implement manually on the C side. Works perfectly for me, but obviously feels wrong to a few people here.
> > >
> > > c) libclang exposes `isCopyConstructible()` etc. This also works for me, just requires a little more book-keeping my end, i.e. is `isCopyConstructible()` is true AND the class (or its bases) does not have an explicit copy constructor, I need to generate one.
> > >
> > > d) There's some way to force clang to instantiate all the implicit methods in the AST. This would make my life very easy, as I wouldn't have to think about any of this, they'd just be treated like regular methods and everything would just be extracted without any special handling.
> > >
> > > So I'd be fine with anything that's not a) :)
> > > All that said, I don't outright object to the implicit special members being eagerly generated if it doesn't cost much compile time/memory usage/etc, but I'm certainly a little hesitant. Totally your call, @aaron.ballman
> >
> > Well, I'm hoping we can find a way to avoid doing that in the general case while still giving users a way to opt into that behavior if they need it for the C APIs or AST matching. I think the overhead would be problematic for general compilation (and it would likely impact things like template instantiation depths due to memory overhead pressure), but allowing folks to opt in gives them the power to decide which properties of the compilation are more important to them.
> > @dblaikie what do you mean by "ctor template" in this context?
>
> ```
> struct t1 {
> template<typename T>
> t1(const T&) { ... }
> };
> ```
> But actually I got that wrong - the compiler won't consider a template for use when copy constructing.
>
> It does for default constructing, though ( https://godbolt.org/z/hax4jhbPY ):
> ```
> #include <iostream>
> struct t1 {
> template<typename ...Ts>
> t1(Ts ...args) {
> std::cout << "Called the variadic template ctor\n";
> }
> };
> int main() {
> t1 v2;
> }
> static_assert(std::is_default_constructible_v<t1>, "");
> ```
> This type doesn't have a default constructor, but it is default constructible.
>
> So maybe that's a place to ask the question: In this case, which property are you interested in? "has a default constructor" or "is default constructible"? (would you want to write a wrapper around an instantiation of the ctor template taking no args?)
>
> Answering that question might suggest the design for the other implicit ctors/operations, even if they don't have quite the same problems.
>
>
> > Well, I'm hoping we can find a way to avoid doing that in the general case while still giving users a way to opt into that behavior if they need it for the C APIs or AST matching.
>
> Ah, I was figuring not wanting to diverge/provide toggles if we can help it - like unused warnings, opt-in features may go undertested (especially for these sort of operations being a bit subtle in some ways). Like we've discussed (at least I remember it coming up many years ago) trying to unify more of the static analyzer and IRGen to avoid divergence there - adding more divergence between static analysis and IRGen/etc seems liable to create more risk of bugs/inconsistencies.
>> Well, I'm hoping we can find a way to avoid doing that in the general case while still giving users a way to opt into that behavior if they need it for the C APIs or AST matching.
> Ah, I was figuring not wanting to diverge/provide toggles if we can help it - like unused warnings, opt-in features may go undertested (especially for these sort of operations being a bit subtle in some ways). Like we've discussed (at least I remember it coming up many years ago) trying to unify more of the static analyzer and IRGen to avoid divergence there - adding more divergence between static analysis and IRGen/etc seems liable to create more risk of bugs/inconsistencies.
I definitely see the appeal to not having a toggle. I was envisioning was more "generating an AST for matching over and C indexing API to create an AST will set this flag" rather than "user-facing way to decide on the behavior in arbitrary situations", but that does make testing more complicated because the C index code wouldn't be generating the same AST as a "normal" compilation.
Hmmm. This might be one of those times when we have to say "sorry, this is what the AST looks like and these APIs operate on the AST" without making changes. That would be pretty unfortunate because I think the use case from @anderslanglands is reasonable, but it also doesn't seem reasonable enough to warrant impacting the performance of all C++ compilations with Clang.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135557/new/
https://reviews.llvm.org/D135557
More information about the cfe-commits
mailing list