[PATCH] D111283: [clang] template / auto deduction deduces common sugar

Tue Oct 12 08:57:02 PDT 2021

mizvekov added a comment.

In D111283#3056748 <https://reviews.llvm.org/D111283#3056748>, @rsmith wrote:

> Do you have examples showing what this does in practice for errors in real-world code? I'm concerned that stripping back to the common type sugar may produce an unintuitive result in some cases, although it certainly does seem at least theoretically nice to use a commutative and associative function to combine deductions like this. The kind of thing I'd be worried about would be (eg) a container where the `iterator` and `const_iterator` are the same, but where the common sugar of `T::iterator` and `T::const_iterator` are some internal type that the user has never heard of. In general it seems like this sugar stripping could result in types that are distant from the user's code. As an extreme example, in the case where one of the types is canonical and the other is not, it seems like stripping all the way back to the canonical type would be worse than ignoring the canonical version of the type and keeping the sugared version.

I don't have handy access to very large code bases where this could be deployed experimentally, but this is something I can try to look into.

> Perhaps we could treat canonical types as a special case (effectively assuming they came from taking a type and canonicalizing it at some point, rather than from a type the user wrote) and when combining two types, use the non-canonical type if one of them is canonical, and otherwise use the common sugar. That should still be commutative and associative and generally well-behaved, but won't degrade badly when given a canonical type as input. I think that still leaves the possibility of a surprising outcome when combining types that both desugar to some internal type, though I'm not sure what we can do about that other than make an arbitrary choice of which type sugar we like more.

The thing I would be concerned with this special case is that it would also give surprising results, ie it could deduce that Socrates is a Dog.
I guess there are two use cases scenarios here, one where we have this transparent hierarchy, and this algorithm gives results that make intuitive sense,
and the other where we have some typedef which we want to be opaque in order to not expose internal details.

So exploring your example, suppose we try to deduce from an iterator and a const_iterator.
We have some options here:

- We deduce as either iterator or const_iterator. If there is an IS-A relationship between them, and we pick the right choice, then we pack up and go home, job well done. If there is no such relationship, neither answer seems right.
- We deduce the canonical type, which might be something like `struct vendor::detail::Foo *`. This exposes internal details, but at least it has some vocabulary information, so you know this is a pointer to an object of some internal class. It's not good from a user friendliness PoV, but it's good from a 'I want to debug this thing' perspective.
- We deduce to some type sugar which is meant to be internal, like `vendor::detail::iterator_t`. This is not very good, maybe it's worse from a user friendliness PoV than the bullet point above as we expose even more internal details. But maybe in some cases it's better as the vendor can also pick a name for the typedef which is more explanatory than the the canonical type, which will still be available in the 'aka', so from the debuggability perspective this also seems better.
- We create some new type sugar which wraps the information that some type was deduced from two other types. This might be too much information overload, and we still need to have an answer to the 'single step desugar' of it, so I am not exploring this much further for now ;)

The problem of hiding internal details in error messages is a general problem, that maybe this solution would make a bit worse in some cases,
but that also means that solutions to the general problem can also remedy the problems we cause here.
One such idea (not proposing formally, just giving an example) would be an attribute on typedefs which hides the underlying sugar,
making it AS-IF the typedef was written on the canonical type. But it would still not hide the canonical type which is also
implementation detail, so not a huge win.

Going back to the 'treat canonical types as not written' workaround, I think there are too many cases where we are doing the wrong thing here in clang.
Basically any type written bare without any name / keyword qualifiers will not be treated by some ElaboratedType like mechanism. I suppose that as we fix
those problems, the need for this workaround will sort of disappear. I am not too opposed to it, but I think it might be better to give less, but more correct information, than
to some times make wild guesses ;-P

> I've not yet done any detailed review of the common type sugar computation mechanism.

No worries, and thanks a ton for the feedback!!!

================
Comment at: clang/test/SemaCXX/sugared-auto.cpp:146
+    return a;
+  return N(); // expected-error {{but deduced as 'SARS (*)() throw(Man, Vibrio)' (aka 'void (*)() throw(Man, Vibrio)')}}
+}
----------------
rsmith wrote:
> rsmith wrote:
> > Why don't we get `Virus` as the deduced return type from line 143 here?
> Oh, never mind, we've not updated the conditional expression handling to use `getCommonSugar` yet. We probably should -- it currently has a very minimal form of the same thing; see `Sema::FindCompositePointerType`. That can presumably be changed to use `getCommonSugar` once it strips down to a common type. On around `SemaExprCXX.cpp:6870`:
> ```
> -  QualType Composite = Composite1;
> +  QualType Composite = Context.getCommonSugar(Composite1, Composite2);
> ```
Yeah, if you look into the next patch in the stack, which is still WIP, there is a change that fixes this aspect of this test case, but it's a different change than what you are proposing here. I will take a look again, but are you proposing that I add this sort of changes to this same patch, or keep things separate as I am trying to do?
Either answer is fine, less patches means less rebuild time for me :-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111283/new/

https://reviews.llvm.org/D111283