[PATCH] D134453: Introduce the `AlwaysIncludeTypeForNonTypeTemplateArgument` into printing policy

Nenad Mikša via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Sep 28 03:03:26 PDT 2022


DoDoENT added inline comments.


================
Comment at: clang/include/clang/AST/PrettyPrinter.h:307
+  /// decltype(s) will be printed as "S<Point{1,2}>" if enabled and as "S<{1,2}>" if disabled,
+  /// regardless if PrintCanonicalTypes is enabled.
+  unsigned AlwaysIncludeTypeForNonTypeTemplateArgument : 1;
----------------
dblaikie wrote:
> DoDoENT wrote:
> > dblaikie wrote:
> > > DoDoENT wrote:
> > > > DoDoENT wrote:
> > > > > dblaikie wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > DoDoENT wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > dblaikie wrote:
> > > > > > > > > > DoDoENT wrote:
> > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > What does `PrintCanonicalTypes` have to do with this? Does it overlap with this functionality in some way, but doesn't provide the functionality you want in particular?
> > > > > > > > > > > Thank you for the question. If you set the `PrintCanonicalTypes` to `false`, the `S<Point{1, 2}>` would be printed as `S<Point{1, 2}>` even without this patch. However, if you set it to `true`, it will be printed as `S<{1, 2}>`.
> > > > > > > > > > > 
> > > > > > > > > > > I don't fully understand why it does that, but it's quite annoying.
> > > > > > > > > > > 
> > > > > > > > > > > For a better example, please take a look at the `TemplateIdWithComplexFullTypeNTTP` unit tests that I've added: if `PrintCanonicalTypes` is set to `true`, the original print output of type is `NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>`, and if set to `false` (which is default), the output is `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` - so the NTTP type is neither fully written nor fully omitted, which is weird.
> > > > > > > > > > > 
> > > > > > > > > > > As I said, I don't really understand the idea behind `PrintCanonicalTypes`, but when my new `AlwaysIncludeTypeForNonTypeTemplateArgument` is enabled, you will get the full type printed, regardless of value of `PrintCanonicalTypes` setting.
> > > > > > > > > > > 
> > > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.
> > > > > > > > > > 
> > > > > > > > > > @aaron.ballman D55552 for context here... 
> > > > > > > > > > 
> > > > > > > > > > Hmm, actually, just adding the top level `Height{{0}}, Width{{0}}, Channels{{0}}` is sufficient to make this code compile (whereas with the `{{{0}}}` it doesn't form a valid identifier.
> > > > > > > > > > 
> > > > > > > > > > So what's your use case for needing more explicitness than that top level? 
> > > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.
> > > > > > > > > >
> > > > > > > > > > @aaron.ballman D55552 for context here...
> > > > > > > > > 
> > > > > > > > > I looked over D55552 again and haven't spotted anything with it that seems amiss; the change there is to grab the canonical type before trying to print it which is all the more I'd expect `PrintCanonicalTypes` to impact.
> > > > > > > > > 
> > > > > > > > > This looks like the behavior you'd get when you desugar the type. Check out the AST dump for `s`: https://godbolt.org/z/vxh5j6qWr
> > > > > > > > > ```
> > > > > > > > > `-VarDecl <line:3:1, col:20> col:20 s 'S<Point{1, 2}>':'S<{1, 2}>' callinit
> > > > > > > > > ```
> > > > > > > > > We generate that type information at https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/TextNodeDumper.cpp#L663 for doing the AST dump, note how the second type printed is the desugared type and that matches what we're seeing from the pretty printer.
> > > > > > > > > So what's your use case for needing more explicitness than that top level?
> > > > > > > > 
> > > > > > > > As I described in the [github issue](https://github.com/llvm/llvm-project/issues/57562), I'm trying to write a clang-based tool that will have different behavior if the printed `{{{0}}}` is actually `Width` than if its `Height` or anything else.
> > > > > > > > 
> > > > > > > > You can see the the issue in the AST dump for `bla`: https://godbolt.org/z/fMr4f13o3
> > > > > > > > 
> > > > > > > > The type is
> > > > > > > > ```
> > > > > > > > `-VarDecl <line:20:1, col:21> col:21 bla 'NDArray<float, W>':'NDArray<float, {{{0}}}>' callinit
> > > > > > > >   `-CXXConstructExpr <col:21> 'NDArray<float, W>':'NDArray<float, {{{0}}}>' 'void () noexcept'
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > so it's unknown whether `{{{0}}}` represent the `Width` or `Height`. My patch makes it work exactly like GCC (see the comparison of error message between [clang 15 and GCC 12.1](https://godbolt.org/z/WenWe8caf).
> > > > > > > > 
> > > > > > > > > Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.
> > > > > > > > 
> > > > > > > > This was also my first thought and the first version of my patch (before even submitting it here) was to actually change the behavior of `PrintCanonicalTypes`. However, that change made some tests fail, as I described in the patch description:
> > > > > > > > 
> > > > > > > > - CodeGenCXX/debug-info-template.cpp
> > > > > > > > - SemaCXX/constexpr-printing.cpp
> > > > > > > > - SemaCXX/cxx2a-nttp-printing.cpp
> > > > > > > > - SemaTemplate/temp_arg_string_printing.cpp
> > > > > > > > 
> > > > > > > > Of course, it's possible to simply update the tests, but I actually don't fully understand what is the goal of `PrintCanonicalTypes` and whether its current behavior is actually desired or not, so I played it safe and introduced a new policy that is disabled by default until I get more feedback from more experienced LLVM developers.
> > > > > > > > 
> > > > > > > > The patch does solve my problem and since I'm building LLVM from source anyway, I can have it enabled in my fork.
> > > > > > > > 
> > > > > > > > I just want to see if it would also be beneficial to be introduced into the upstream LLVM.
> > > > > > > > Of course, it's possible to simply update the tests, but I actually don't fully understand what is the goal of PrintCanonicalTypes and whether its current behavior is actually desired or not, so I played it safe and introduced a new policy that is disabled by default until I get more feedback from more experienced LLVM developers.
> > > > > > > 
> > > > > > > `PrintCanonicalTypes` is what controls output like whether we print a typedef name (not the canonical type) or the final underlying type all typedefs involved (the canonical type). It won't have an impact on things like whether we print the name of a structure or not.
> > > > > > > 
> > > > > > > After looking into this, I think we do want changes here. I'm not 100% convinced we need a new policy member. I tried out printing the type names unconditionally and the results were a bit of a mixed bag (some diagnostics got more clear, other diagnostics didn't become more clear but also didn't become more confusing):
> > > > > > > ```
> > > > > > > error: 'note' diagnostics expected but not seen:
> > > > > > >   File F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp Line 189: in instantiation of template class 'Diags::X<{1, 2}>' requested here
> > > > > > > error: 'note' diagnostics seen but not expected:
> > > > > > >   File F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp Line 189: in instantiation of template class 'Diags::X<Diags::A{1, 2}>' requested here
> > > > > > > ```
> > > > > > > seems like a clarifying change, while:
> > > > > > > ```
> > > > > > > error: 'error' diagnostics expected but not seen:
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 19: no member named 'display' in 'ASCII<{"this nontype template argument is [...]"}>'
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 25: no member named 'display' in 'ASCII<{{119, 97, 105, 116, 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 33: no member named 'display' in 'ASCII<{"what??!"}>'
> > > > > > > error: 'error' diagnostics seen but not expected:
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 19: no member named 'display' in 'ASCII<Str<43>{"this nontype template argument is [...]"}>'
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 25: no member named 'display' in 'ASCII<Str<14>{{119, 97, 105, 116, 32, 97, 32, 115, 27, 99, ...}}>'
> > > > > > >   File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 33: no member named 'display' in 'ASCII<Str<8>{"what??!"}>'
> > > > > > > ```
> > > > > > > seems like it's neither here nor there.
> > > > > > > 
> > > > > > > @dblaikie, do you have feelings on how to go with this?
> > > > > > Yeah, I'm inclined to think that `Diags::X<{1, 2}>` is just too simplified - it's unambiguous if the parameter isn't `auto`, but isn't valid syntax (so the language still expects a type name there) & so maybe we should do the same in diagnostics?
> > > > > > 
> > > > > > @aaron.ballman - though I'm still confused by the behavior-change when `PrintCanonicalTypes = false` that causes the top level names to be printed? Maybe that's just a weird case/red herring/bug in `PrintCanonicalTypes = true` that skips those top level names and we could print them unconditionally?
> > > > > > 
> > > > > > @DoDoENT - I was/am curious if/why you need more explicitness than would be provided by `PrintCanonicalTypes = false` - if the output was `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` would that be sufficient for your needs? (I think in that case the name would be valid to use in code, which I think is a reasonable basis to make the decision - but I'm not sure how to justify adding all the intermediate names too)
> > > > > > 
> > > > > > `PrintCanonicalTypes` is what controls output like whether we print a typedef name (not the canonical type) or the final underlying type all typedefs involved (the canonical type). It won't have an impact on things like whether we print the name of a structure or not.
> > > > > 
> > > > > Thank you for the explanation.
> > > > > 
> > > > > > I was/am curious if/why you need more explicitness than would be provided by `PrintCanonicalTypes = false` - if the output was `NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>` would that be sufficient for your needs? 
> > > > > 
> > > > > It would for now (at least until I actually start needing to get the types of inner braces), however, in a real-world scenario I actually get output `NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>`, regardless of the value of `PrintCanonicalTypes`. I've tried to create a minimum reproducer for that case, but, unfortunately, all simple cases resolve to actually display the names `Height`, `Width`, etc. The closes thing to the real-world scenario is the diagnostic message shown [here](https://godbolt.org/z/WenWe8caf).
> > > > > 
> > > > > However, in my real code, where there are multiple levels of type deduction happening, I always get the output without a type name, regardless of the value of `PrintCanonicalTypes`. This is why I took some time to debug what is actually happening within clang and created this patch which now works for me in all cases.
> > > > > Maybe that's just a weird case/red herring/bug in `PrintCanonicalTypes = true` that skips those top level names and we could print them unconditionally?
> > > > 
> > > > From my debugging sessions, I saw that the difference happens during printing whether the type in question is deduced type or known type. If it's a known type, then in both cases the type gets printed (in a clang-based tool, not in the diagnostic as shown on Godbolt). However, if it's a deduced type, then the type gets or doesn't get printed depending on whether the canonical type is requested. If canonical type is requested from a deduced type, in AST it's actually represented as an "expression statement", not a "type", and then the type printer tries to print it as an "expression statement". This then leads to printing code in APValue and TemplateBase thinking that it's actually printing a normal C++ expression (as in the case of a clang-tidy or similar refactor tool), and not as a non-type template parameter, yielding printout with omitted types, as that is generally expected when writing code. 
> > > > 
> > > > However, if you are using the printer to get a full type name in a string for different purposes, as I do, then you get incomplete information. So, there may be cases when you actually need/want full-type info and cases when you don't want it. Therefore, I think some new policy config should be in place to let people choose the behavior.
> > > > It would for now (at least until I actually start needing to get the types of inner braces), 
> > > 
> > > This ^ is what I was getting at - do you have a use/need for the explicit nested type names? Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?
> > > 
> > > Because that presents some tension between the needs of diagnostic users (where I think adding this extra info that can be inferred from context (by both the user and the C++ language/compiler) and would potentially make diagnostics noisier/less helpful) and your use case (which I'm trying to understand).
> > > 
> > > If there are multiple levels of deduction/ambiguity, I'm all for having more explicit names as necessary to remove the ambiguity, so maybe small examples of that would be helpful too, to make sure the solution is as deep as it needs to be.
> > > 
> > > > however, in a real-world scenario I actually get output `NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>`
> > > 
> > > Right, I'm happy to/leaning towards/trying to understand if we can consider that a bug in whatever way it shows up.
> > > 
> > > Hmm, seems this might be related to the FIXME here: https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424 - which, right, is the code you're changing in this patch - but I think the FIXME hints at this not being something we necessarily want to have as opt-in, but as something that should be fixed in this code possibly without the need for a new printing policy.
> > > 
> > > Seems reasonable that this produces at least unambiguous names (so it should print out the name if it's in a deduced context) and I think it's probably good if it prints out valid names (so even includes the name in non-deduced contexts, because C++ requires that too when writing the name) to simplify things for users of diagnostics, and in your case.
> > >  Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?
> > 
> > I'm writing a clang-based tool for internal use in our company that will need to parse some c++ code from a machine-generated function, representing the execution of computation graph (like those used in machine learning inference), then extract the exact types of all intermediate results and feed that into an algorithm that will optimize and possibly reorder the execution of graph nodes, depending on the types of the intermediate results (after optimization the types may change, but we know exactly how). Since we are encoding a lot of compile-time information into the type names of the intermediates, we need full type names here. As I'm a rookie with LLVM and don't know better, I implemented the FrontendAction and RecursiveVisitor to visit variables representing the variables I'm interested in and used `getAsString` to obtain type names. Unfortunately, I was not getting full type names and therefore I investigated further and that resulted with this patch.
> > 
> > > Because that presents some tension between the needs of diagnostic users (where I think adding this extra info that can be inferred from context (by both the user and the C++ language/compiler) and would potentially make diagnostics noisier/less helpful) and your use case (which I'm trying to understand).
> > 
> > In that case, it makes sense to introduce a new policy. Maybe only a better name for it should be invented...
> > 
> > > Hmm, seems this might be related to the FIXME here: https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424 - which, right, is the code you're changing in this patch - but I think the FIXME hints at this not being something we necessarily want to have as opt-in, but as something that should be fixed in this code possibly without the need for a new printing policy.
> > > 
> > > Seems reasonable that this produces at least unambiguous names (so it should print out the name if it's in a deduced context) and I think it's probably good if it prints out valid names (so even includes the name in non-deduced contexts, because C++ requires that too when writing the name) to simplify things for users of diagnostics, and in your case.
> > 
> > If only a change in `TemplateBase.cpp` is applied, but not also in `APValue.cpp`, then at least I get consistently printed out `Height{{{0}}}`, regardless of the value of `PrintCanonicalTypes`, so change in this line may indeed be a bugfix.
> > 
> > However, a change in `APValue.cpp` indeed adds noise, as you call it, to the diagnostic messages, as it ensures that the full type name is written. However, the same noise level is always generated by GCC and, in my opinion, although noisier, the GCC's error message is more thorough. But I'm OK with keeping that change behind a new policy.
> > > Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?
> > 
> > I'm writing a clang-based tool for internal use in our company that will need to parse some c++ code from a machine-generated function, representing the execution of computation graph (like those used in machine learning inference), then extract the exact types of all intermediate results and feed that into an algorithm that will optimize and possibly reorder the execution of graph nodes, depending on the types of the intermediate results (after optimization the types may change, but we know exactly how). Since we are encoding a lot of compile-time information into the type names of the intermediates, we need full type names here. As I'm a rookie with LLVM and don't know better, I implemented the FrontendAction and RecursiveVisitor to visit variables representing the variables I'm interested in and used getAsString to obtain type names. Unfortunately, I was not getting full type names and therefore I investigated further and that resulted with this patch.
> 
> But if the type as a string is correct as far as C++ is concerned, is that not enough information (necessarily - the C++ compiler can determine all the type information from the type as written with only the top level types in the NTTP names)? If it's that the tool is trying to get the type information purely from the string representation, maybe the tool should be doing things differently - relying on Clang's AST/type APIs, rather than on inspecting the resulting string?
> If it's that the tool is trying to get the type information purely from the string representation, maybe the tool should be doing things differently - relying on Clang's AST/type APIs, rather than on inspecting the resulting string?

I completely agree, however, the documentation on type API is not very good, and couldn't find a better way to get a string representation of the type using this API, besides manually traversing the AST and concatenating bits and pieces into a string which is essentially the same thing that TypePrinter does...

On the other hand, inventing new data structures to hold the type information or directly saving pointers to parts of the AST does not sound appealing to me, at least not for the current level of complexity I'm aiming for.

> But if the type as a string is correct as far as C++ is concerned, is that not enough information (necessarily - the C++ compiler can determine all the type information from the type as written with only the top level types in the NTTP names)?

I think that the essential problem here is the //**as far as C++ is concerned**//? What about the developer? Developers may have more benefit from getting a full type name in their diagnostic than just something that is correct as far as C++ is concerned.

Consider [this error message, which is completely useless](https://godbolt.org/z/c6W6oMYrh) and compare it to the much more informative errors by [GCC](https://godbolt.org/z/Ge89TePjq) and [MSVC](https://godbolt.org/z/aKr3T7nfz) that immediately tell you what's wrong with your code.

So, why shouldn't Clang behave the same way as well?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134453/new/

https://reviews.llvm.org/D134453



More information about the cfe-commits mailing list