[cfe-dev] __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Tue Sep 25 12:46:19 PDT 2018

Answering my own question:

    Is there a single 'canonical' init-list for a LiteralType?
 => No.

If LiteralType was further required to be structured-bindable,
and to have a corresponding constructor for the bound types,
then the compiler could select that as the canonical constructor:

Given a value v of type T
   auto [a,b,c...]{v}  structured binding
   T(a,b,c...)         canonical construction, reconstructs v

This provides a unique default choice for pretty-print output.
The 'reconstructing' constructor is likely to optimise nicely.

A requirement to be structured-bindable seems natural
(despite structured-binding not being constexpr yet).

Aggregates satisfy the reconstructing constructor requirement.
It feels more onerous to require general LiteralType classes to
provide such a constructor.

On Mon, Sep 24, 2018 at 3:11 PM will wray <wjwray at gmail.com> wrote:

> Thanks again Richard for a second more detailed response setting out the
> broader view
> (I had failed to fully interpret / misinterpreted your first response as
> mild pushback
>  apologies for that; I should've asked for the clarification)
> (also, I shouldn't have argued 'correctness' here; that was disingenuous).
>
> So, in the broader view, the question is:
>  > How best to pretty print general non-type template args.
>
> Now, as non-types may be subsumed by LiteralType literals in C++20
> we should look ahead to pretty printing of such general literal types
> as the broader context for discussion or decisions on changes.
>
>   Non-type template args
>   Until C++17:
>     - an integral type;
>     - an enumeration type;
>     - std::nullptr_t; (since C++11)
>     - a pointer type (to object or to function);
>     - a pointer to member type (to member object or to member function);
>     - lvalue reference type (to object or to function);
>   For C++20:
>     - A type that satisfies LiteralType, has strong structural equality ...
>
> Pretty-print design goals:
>   General: minimise special-casing
>   Round-trip: the printed value can be plugged back into code
>   Low-verbosity: avoid unnecessary info or repetition
>   more?
>
> The general literal type can be one of the C++17 built-ins
> as above (all scalar types, apart from lvalue reference I think)
> or an aggregate or class that satisfies the LiteralType requirements.
>
> So, in general, we will now have to deal with non-scalar types
> implying multiple initializers and init-lists.
>
> Question: is there a single 'canonical' init-list for a LiteralType?
>
> This is a place where the proposal to use parens for aggregates will help,
> as special-casing initialization syntax, for brace or paren, is awkward:
> P0960 Allow initializing aggregates from a parenthesized list of values
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0960r1.html
> If this proposal is accepted then that would suggest to prefer paren-init.
>
> Literals of the existing types remain awkward non-types -
> 'function-style' paren-init doesn't work directly for multi-token types
> (like 'long long') or for compound types (pointer/ref).
> An added decltype can solve that. Sticking with C-style cast is simpler.
>
> Here's a suggestion (assuming P0960 for non-scalars):
>
>   Scalar:
>     Known type T; print as 'value'
>     Deduced type; print as '(T)value'
>   Non-scalar:
>     Known type T; print as '(init...)'
>     Deduced type; print as 'T(init...)'
>
> 'Known type' means non-deduced or T implicit in value
> (for values of scalar type like enumerator or pointer-to-member).
> (lvalue reference remains a special case).
>
> In practice, I don't know if compilers can easily distinguish
> whether a value's type is known or deduced, at the leaf-level.
> If not then this change will be more involved than just updating
> low-level printing functions.
>
> Now let's focus back on integral and enumeration types.
>
> Integral / enum types
> ---------------------
> Clang, MSVC and GCC all output plain digits for Integral.
> Clang, MSVC and GCC* all output enum id for enumerators.
> Clang, MSVC output plain digits for non-enumerated values,
> while GCC outputs a C-style cast for non-enumerated values.
>
> *GCC enum: I submitted a bug report with a patch and tests:
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364
> (comments, subscribers or reviews will be much appreciated).
> With this fix GCC now prints enum id for enumerators and
> continues to print non-enumerated values by C-style cast.
> I plan to further investigate the Integral output for GCC.
>
> I submitted an MSVC request for Integral and enum types:
>
> https://developercommunity.visualstudio.com/content/problem/339663/improve-pretty-print-of-integral-non-type-template.html
>
> How best to print the actual integral values?
>   MSVC currently prints as '0xhhh' where h are hex digits,
>   GCC and Clang print as 'ddd' where d are decimal digits.
> Decimal is less verbose, until the millions, but can be slower.
> Hex simplifies round-trip as it is easy to generate and parse.
> Hex is already best for reporting addresses (pointer/lval-ref).
>
> It'd be good to get some consensus for Integral/enum output.
> Ideally we can see a coordinated change in Clang, GCC & MSVC.
>
> If the suggestion here looks acceptable then I can attempt
> to update Clang printing of Integral and non-enumerated enums.
>
>
>
>
> On Tue, Sep 11, 2018 at 7:56 PM Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On Mon, 10 Sep 2018 at 09:47, will wray via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Thanks Richard.
>>>
>>> Ok; ignore Question 2 - you are absolutely right (what I was thinking).
>>> Printing different labels for the same type defeats the point of this
>>> request.
>>>
>>> OTOH, Question 1 is a request for Clang to print different output for
>>> different types.
>>>
>>
>> I think you might be under the impression that I said that we shouldn't
>> address this, which was very much not my intent. What I tried to say was
>> that we *should* emit a cast in the case of a deduced template parameter
>> type, when the type of the argument expression we'd otherwise print is
>> different from that of the parameter. That should be sufficient to ensure
>> that Clang prints different outputs for different types. Note that
>> special-casing enums is the wrong way to fix that problem, as the problem
>> is not specific to enums. Consider, for example:
>>
>> template<auto> struct X {};
>> X<(short)0> xa;
>> X<(long)0> xb;
>>
>> The types of xa and xb both pretty-print as X<0> currently.
>>
>> There are (at least) two natural ways to fix this: by always printing a
>> cast when the type of the argument isn't the same as that of the parameter
>> (which is the unnecessary verbosity that I was concerned with, and is what
>> we'd get if we generalized the proposed approach for enumerations to
>> address the whole problem), or by doing something smarter that depends on
>> whether the template parameter was deduced (which is what I'm suggesting is
>> the right way to address this).
>>
>> The linked bug report is an example of where it is important to
>>> distinguish types.
>>>
>> Here's a summary of suggested output vs current output for Clang, GCC,
>>> MSVC.
>>> https://godbolt.org/z/QEae7F
>>>
>>> Plain, unscoped enum
>>>    enum e { a, b, c=b };
>>>    auto_name<a, e(1), c, e(3)>
>>>
>>>   Suggested: a, b, b, (e)3
>>>       Clang: a, b, b, 3
>>>         GCC: (e)0, (e)1, (e)1, (e)3
>>>        MSCV: 0, 1, 1, 3
>>>
>>> Scoped / opaque enum:
>>>    enum class E { a, b, c=b };
>>>    auto_name<E::a, E{1}, E::c, E{3}>
>>>
>>>   Suggested:  E::a, E::b, E::b, (E)3
>>>       Clang:  E::a, E::b, E::b, 3
>>>         GCC:  (E)0, (E)1, (E)1, (E)3
>>>        MSCV: 0, 1, 1, 3
>>>
>>> None of these are ideal -
>>> - Clang loses type info for non-enumerated values.
>>> - GCC does not distinguish enumerated / non-enumerated.
>>> - MSVC loses type all info, does not distinguish.
>>>
>>> Interestingly, it turns out that GCC's behaviour is a bug!
>>> (The *intent* of gcc code is to print the suggested output.
>>>  I intend to submit a bug report and a patch.)
>>>
>>> You are concerned that this suggested change would
>>>
>>>> make the template argument string more verbose without (necessarily)
>>>> providing any more information.
>>>
>>>
>>> Yet the information is necessary in order to distinguish different types.
>>> This change is needed for correctness.
>>>
>>
>> For what it's worth, I don't see a correctness concern here. Anything
>> assuming our pretty-printed type names are unique is wrong. The bug you
>> referenced is a bug caused by a tool getting this wrong and using the
>> pretty type name string inappropriately, and the contributors on that bug
>> report seem happy to acknowledge that.
>>
>> That said, as a quality of implementation matter, we should try to ensure
>> our pretty-printed type names are unique anyway wherever possible.
>>
>> Verbosity is a secondary concern.
>>> Clang's current behaviour here could be treated as a bug
>>> (it has been described as "seriously suboptimal" by a gcc dev).
>>>
>>> For Clang, the increased verbosity is only for non-enumerated values.
>>> Enum-templates would be expected to mostly use enumerated values.
>>> Also, verbosity becomes more similar for enumerated / non-enumerated
>>> the number of characters in the identifier label vs the number of digits.
>>>
>>> I'm now preferring C-style cast notation over C++ function-style
>>> as it simplifies output and parsing code.
>>>
>>> Please reflect on this and consider - is this suggested change
>>> acceptable?
>>>
>>
>> As noted above, the problem you described, while valid, does not justify
>> the solution you're suggesting. However, for enums specifically, I think
>> there's another argument to consider (one that I think you've not made
>> yet): passing a plain integer as a template argument to an enumeration
>> template parameter won't compile, so our pretty-printed type is not valid
>> C++. It's reasonable to consider how valuable it is to fix that, compared
>> with the value of having a shorter pretty-printed type name, but as you
>> point out, we're already paying the cost of including the enum type name in
>> the template argument in the case where an scoped enumeration is named, and
>> an integer cast to enum type would be of comparable length. So I think the
>> change you suggested is a good idea for that reason.
>>
>> On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>>> On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> A request to change Clang's __PRETTY_FUNCTION__ output to fix an issue.
>>>>>
>>>>> The code used in pretty function is also used to generate debug info
>>>>> in
>>>>> DWARF etc
>>>>> .
>>>>> (I believe). See, for instance the comment on this bug report:
>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21492#c1
>>>>>
>>>>>> Clang shows these arguments with the same DW_AT_name, X<1>, but
>>>>>> they're distinct types
>>>>>
>>>>>
>>>>> https://godbolt.org/z/0Ik0Hl
>>>>> Pretty function output, for auto... args
>>>>> (variadic just so that outputs display together):
>>>>>
>>>>> template <auto... V>
>>>>> constexpr char const* auto_name()
>>>>> {
>>>>> return __PRETTY_FUNCTION__;
>>>>> }
>>>>>
>>>>> enum e { a, b, c=b };
>>>>> enum class E { a, b, c=b };
>>>>>
>>>>> char const* nameU() { return auto_name<a, b, c, e(3)>(); }
>>>>> char const* nameS() { return auto_name<E::a, E::b, E::c, E{3}>(); }
>>>>>
>>>>>
>>>>> Gives output with these strings embedded;
>>>>>
>>>>> nameU:
>>>>> <a, b, b, 3>
>>>>>
>>>>> nameS:
>>>>> <E::a, E::b, E::b, 3>
>>>>>
>>>>>
>>>>> The first two enumerated values, a & b, are returned with no loss of
>>>>> type info as a, b.
>>>>> The next enumerated value, c, is a duplicate label for b's value, and
>>>>> is reported as b.
>>>>> The final value is not enumerated - there is no enum label with value
>>>>> 3 -
>>>>> it is reported with an unnecessary loss of type info
>>>>> as plain digits 3.
>>>>>
>>>>> The non-enumerated value should be reported with a cast or constructor:
>>>>>
>>>>> unscoped e: 3 -> (e)3 or e(3)
>>>>>
>>>>>   scoped   E: 3 -> (E)3 or E(3) or E{3}
>>>>>
>>>>>  (init-list E{3} only for scoped enum and only since c++17)
>>>>>
>>>>>
>>>>> Question 1:
>>>>>
>>>>> Will it be acceptable to change the output for non-enumerated values
>>>>> as so?
>>>>>
>>>>
>>>> In the case where the template parameter involves a deduced type
>>>> (`auto` or `decltype(auto)` or eventually a deduced template specialization
>>>> type), it would make a lot of sense to include the actual type in the
>>>> printed template argument if it differs from the type of the formatted
>>>> argument string. But I think we should leave out the type in all other
>>>> cases, as it would make the template argument string more verbose without
>>>> (necessarily) providing any more information.
>>>>
>>>> This will retain type info and allow round-trip without loss.
>>>>> Either C-style cast (e)3  or  function-cast / type-constructor e(3) -
>>>>> seem ok
>>>>> (gcc uses C-style casts and Clang uses this same style in source
>>>>> labels).
>>>>>
>>>>> For the duplicate-valued labels case there is no loss of type info,
>>>>> but there is a loss of information about the label used for the auto
>>>>> arg.
>>>>> Presumably, the code works from the supplied underlying-type value
>>>>> and picks the first label with the given value. I'd guess that Clang
>>>>> knows
>>>>> the label used in a direct instantiation and could return it in that
>>>>> case.
>>>>> (c.f. named member-pointer args have to retain the id for reporting).
>>>>>
>>>>
>>>> In the case of duplicated enumerators, there simply is no distinction
>>>> between 'b' and 'c' -- they are both merely names for the value 1 of type
>>>> 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore
>>>> must return the same string.
>>>>
>>>>
>>>>> Question 2:
>>>>>
>>>>> Is it possible / acceptable to retain the label in the duplicate enum
>>>>> value case?
>>>>>
>>>>
>>>> No, per the above, we cannot and must not do that.
>>>>
>>>> Can anyone point me to the relevant code?
>>>>>
>>>>
>>>> This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The
>>>> "else" case at the bottom would need to print out additional text
>>>> representing the cast when needed.
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180925/b6eb609c/attachment.html>