[cfe-dev] __PRETTY_FUNCTION__ output loses enum type info for non-enumerated values

Mon Sep 24 12:11:51 PDT 2018

Thanks again Richard for a second more detailed response setting out the
broader view
(I had failed to fully interpret / misinterpreted your first response as
mild pushback
 apologies for that; I should've asked for the clarification)
(also, I shouldn't have argued 'correctness' here; that was disingenuous).

So, in the broader view, the question is:
 > How best to pretty print general non-type template args.

Now, as non-types may be subsumed by LiteralType literals in C++20
we should look ahead to pretty printing of such general literal types
as the broader context for discussion or decisions on changes.

  Non-type template args
  Until C++17:
    - an integral type;
    - an enumeration type;
    - std::nullptr_t; (since C++11)
    - a pointer type (to object or to function);
    - a pointer to member type (to member object or to member function);
    - lvalue reference type (to object or to function);
  For C++20:
    - A type that satisfies LiteralType, has strong structural equality ...

Pretty-print design goals:
  General: minimise special-casing
  Round-trip: the printed value can be plugged back into code
  Low-verbosity: avoid unnecessary info or repetition
  more?

The general literal type can be one of the C++17 built-ins
as above (all scalar types, apart from lvalue reference I think)
or an aggregate or class that satisfies the LiteralType requirements.

So, in general, we will now have to deal with non-scalar types
implying multiple initializers and init-lists.

Question: is there a single 'canonical' init-list for a LiteralType?

This is a place where the proposal to use parens for aggregates will help,
as special-casing initialization syntax, for brace or paren, is awkward:
P0960 Allow initializing aggregates from a parenthesized list of values
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0960r1.html
If this proposal is accepted then that would suggest to prefer paren-init.

Literals of the existing types remain awkward non-types -
'function-style' paren-init doesn't work directly for multi-token types
(like 'long long') or for compound types (pointer/ref).
An added decltype can solve that. Sticking with C-style cast is simpler.

Here's a suggestion (assuming P0960 for non-scalars):

  Scalar:
    Known type T; print as 'value'
    Deduced type; print as '(T)value'
  Non-scalar:
    Known type T; print as '(init...)'
    Deduced type; print as 'T(init...)'

'Known type' means non-deduced or T implicit in value
(for values of scalar type like enumerator or pointer-to-member).
(lvalue reference remains a special case).

In practice, I don't know if compilers can easily distinguish
whether a value's type is known or deduced, at the leaf-level.
If not then this change will be more involved than just updating
low-level printing functions.

Now let's focus back on integral and enumeration types.

Integral / enum types
---------------------
Clang, MSVC and GCC all output plain digits for Integral.
Clang, MSVC and GCC* all output enum id for enumerators.
Clang, MSVC output plain digits for non-enumerated values,
while GCC outputs a C-style cast for non-enumerated values.

*GCC enum: I submitted a bug report with a patch and tests:
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364
(comments, subscribers or reviews will be much appreciated).
With this fix GCC now prints enum id for enumerators and
continues to print non-enumerated values by C-style cast.
I plan to further investigate the Integral output for GCC.

I submitted an MSVC request for Integral and enum types:
https://developercommunity.visualstudio.com/content/problem/339663/improve-pretty-print-of-integral-non-type-template.html

How best to print the actual integral values?
  MSVC currently prints as '0xhhh' where h are hex digits,
  GCC and Clang print as 'ddd' where d are decimal digits.
Decimal is less verbose, until the millions, but can be slower.
Hex simplifies round-trip as it is easy to generate and parse.
Hex is already best for reporting addresses (pointer/lval-ref).

It'd be good to get some consensus for Integral/enum output.
Ideally we can see a coordinated change in Clang, GCC & MSVC.

If the suggestion here looks acceptable then I can attempt
to update Clang printing of Integral and non-enumerated enums.

On Tue, Sep 11, 2018 at 7:56 PM Richard Smith <richard at metafoo.co.uk> wrote:

> On Mon, 10 Sep 2018 at 09:47, will wray via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Thanks Richard.
>>
>> Ok; ignore Question 2 - you are absolutely right (what I was thinking).
>> Printing different labels for the same type defeats the point of this
>> request.
>>
>> OTOH, Question 1 is a request for Clang to print different output for
>> different types.
>>
>
> I think you might be under the impression that I said that we shouldn't
> address this, which was very much not my intent. What I tried to say was
> that we *should* emit a cast in the case of a deduced template parameter
> type, when the type of the argument expression we'd otherwise print is
> different from that of the parameter. That should be sufficient to ensure
> that Clang prints different outputs for different types. Note that
> special-casing enums is the wrong way to fix that problem, as the problem
> is not specific to enums. Consider, for example:
>
> template<auto> struct X {};
> X<(short)0> xa;
> X<(long)0> xb;
>
> The types of xa and xb both pretty-print as X<0> currently.
>
> There are (at least) two natural ways to fix this: by always printing a
> cast when the type of the argument isn't the same as that of the parameter
> (which is the unnecessary verbosity that I was concerned with, and is what
> we'd get if we generalized the proposed approach for enumerations to
> address the whole problem), or by doing something smarter that depends on
> whether the template parameter was deduced (which is what I'm suggesting is
> the right way to address this).
>
> The linked bug report is an example of where it is important to
>> distinguish types.
>>
> Here's a summary of suggested output vs current output for Clang, GCC,
>> MSVC.
>> https://godbolt.org/z/QEae7F
>>
>> Plain, unscoped enum
>>    enum e { a, b, c=b };
>>    auto_name<a, e(1), c, e(3)>
>>
>>   Suggested: a, b, b, (e)3
>>       Clang: a, b, b, 3
>>         GCC: (e)0, (e)1, (e)1, (e)3
>>        MSCV: 0, 1, 1, 3
>>
>> Scoped / opaque enum:
>>    enum class E { a, b, c=b };
>>    auto_name<E::a, E{1}, E::c, E{3}>
>>
>>   Suggested:  E::a, E::b, E::b, (E)3
>>       Clang:  E::a, E::b, E::b, 3
>>         GCC:  (E)0, (E)1, (E)1, (E)3
>>        MSCV: 0, 1, 1, 3
>>
>> None of these are ideal -
>> - Clang loses type info for non-enumerated values.
>> - GCC does not distinguish enumerated / non-enumerated.
>> - MSVC loses type all info, does not distinguish.
>>
>> Interestingly, it turns out that GCC's behaviour is a bug!
>> (The *intent* of gcc code is to print the suggested output.
>>  I intend to submit a bug report and a patch.)
>>
>> You are concerned that this suggested change would
>>
>>> make the template argument string more verbose without (necessarily)
>>> providing any more information.
>>
>>
>> Yet the information is necessary in order to distinguish different types.
>> This change is needed for correctness.
>>
>
> For what it's worth, I don't see a correctness concern here. Anything
> assuming our pretty-printed type names are unique is wrong. The bug you
> referenced is a bug caused by a tool getting this wrong and using the
> pretty type name string inappropriately, and the contributors on that bug
> report seem happy to acknowledge that.
>
> That said, as a quality of implementation matter, we should try to ensure
> our pretty-printed type names are unique anyway wherever possible.
>
> Verbosity is a secondary concern.
>> Clang's current behaviour here could be treated as a bug
>> (it has been described as "seriously suboptimal" by a gcc dev).
>>
>> For Clang, the increased verbosity is only for non-enumerated values.
>> Enum-templates would be expected to mostly use enumerated values.
>> Also, verbosity becomes more similar for enumerated / non-enumerated
>> the number of characters in the identifier label vs the number of digits.
>>
>> I'm now preferring C-style cast notation over C++ function-style
>> as it simplifies output and parsing code.
>>
>> Please reflect on this and consider - is this suggested change acceptable?
>>
>
> As noted above, the problem you described, while valid, does not justify
> the solution you're suggesting. However, for enums specifically, I think
> there's another argument to consider (one that I think you've not made
> yet): passing a plain integer as a template argument to an enumeration
> template parameter won't compile, so our pretty-printed type is not valid
> C++. It's reasonable to consider how valuable it is to fix that, compared
> with the value of having a shorter pretty-printed type name, but as you
> point out, we're already paying the cost of including the enum type name in
> the template argument in the case where an scoped enumeration is named, and
> an integer cast to enum type would be of comparable length. So I think the
> change you suggested is a good idea for that reason.
>
> On Tue, Sep 4, 2018 at 4:30 PM Richard Smith <richard at metafoo.co.uk>
>> wrote:
>>
>>> On Tue, 4 Sep 2018 at 09:09, will wray via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> A request to change Clang's __PRETTY_FUNCTION__ output to fix an issue.
>>>>
>>>> The code used in pretty function is also used to generate debug info in
>>>> DWARF etc
>>>> .
>>>> (I believe). See, for instance the comment on this bug report:
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21492#c1
>>>>
>>>>> Clang shows these arguments with the same DW_AT_name, X<1>, but
>>>>> they're distinct types
>>>>
>>>>
>>>> https://godbolt.org/z/0Ik0Hl
>>>> Pretty function output, for auto... args
>>>> (variadic just so that outputs display together):
>>>>
>>>> template <auto... V>
>>>> constexpr char const* auto_name()
>>>> {
>>>> return __PRETTY_FUNCTION__;
>>>> }
>>>>
>>>> enum e { a, b, c=b };
>>>> enum class E { a, b, c=b };
>>>>
>>>> char const* nameU() { return auto_name<a, b, c, e(3)>(); }
>>>> char const* nameS() { return auto_name<E::a, E::b, E::c, E{3}>(); }
>>>>
>>>>
>>>> Gives output with these strings embedded;
>>>>
>>>> nameU:
>>>> <a, b, b, 3>
>>>>
>>>> nameS:
>>>> <E::a, E::b, E::b, 3>
>>>>
>>>>
>>>> The first two enumerated values, a & b, are returned with no loss of
>>>> type info as a, b.
>>>> The next enumerated value, c, is a duplicate label for b's value, and
>>>> is reported as b.
>>>> The final value is not enumerated - there is no enum label with value 3
>>>> -
>>>> it is reported with an unnecessary loss of type info
>>>> as plain digits 3.
>>>>
>>>> The non-enumerated value should be reported with a cast or constructor:
>>>>
>>>> unscoped e: 3 -> (e)3 or e(3)
>>>>
>>>>   scoped   E: 3 -> (E)3 or E(3) or E{3}
>>>>
>>>>  (init-list E{3} only for scoped enum and only since c++17)
>>>>
>>>>
>>>> Question 1:
>>>>
>>>> Will it be acceptable to change the output for non-enumerated values as
>>>> so?
>>>>
>>>
>>> In the case where the template parameter involves a deduced type (`auto`
>>> or `decltype(auto)` or eventually a deduced template specialization type),
>>> it would make a lot of sense to include the actual type in the printed
>>> template argument if it differs from the type of the formatted argument
>>> string. But I think we should leave out the type in all other cases, as it
>>> would make the template argument string more verbose without (necessarily)
>>> providing any more information.
>>>
>>> This will retain type info and allow round-trip without loss.
>>>> Either C-style cast (e)3  or  function-cast / type-constructor e(3) -
>>>> seem ok
>>>> (gcc uses C-style casts and Clang uses this same style in source
>>>> labels).
>>>>
>>>> For the duplicate-valued labels case there is no loss of type info,
>>>> but there is a loss of information about the label used for the auto
>>>> arg.
>>>> Presumably, the code works from the supplied underlying-type value
>>>> and picks the first label with the given value. I'd guess that Clang
>>>> knows
>>>> the label used in a direct instantiation and could return it in that
>>>> case.
>>>> (c.f. named member-pointer args have to retain the id for reporting).
>>>>
>>>
>>> In the case of duplicated enumerators, there simply is no distinction
>>> between 'b' and 'c' -- they are both merely names for the value 1 of type
>>> 'e'. auto_name<b>() and auto_name<c>() are the same function and therefore
>>> must return the same string.
>>>
>>>
>>>> Question 2:
>>>>
>>>> Is it possible / acceptable to retain the label in the duplicate enum
>>>> value case?
>>>>
>>>
>>> No, per the above, we cannot and must not do that.
>>>
>>> Can anyone point me to the relevant code?
>>>>
>>>
>>> This is printIntegral, at the top of lib/AST/TemplateBase.cpp. The
>>> "else" case at the bottom would need to print out additional text
>>> representing the cast when needed.
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180924/6fe8a8fc/attachment.html>