[PATCH] D45680: [C++2a] Add operator<=> Rewriting - Early Attempt

Tue Apr 17 02:57:06 PDT 2018

rsmith added a comment.

Definitely some interesting questions to take to CWG here. :)

================
Comment at: lib/Sema/SemaOverload.cpp:9218-9219
+    // --- F2 is a rewritten candidate ([over.match.oper]) and F1 is not.
+    if (Cand2.getRewrittenKind() && !Cand1.getRewrittenKind())
+      return true;
+    if (Cand1.getRewrittenKind() && Cand2.getRewrittenKind() &&
----------------
EricWF wrote:
> EricWF wrote:
> > EricWF wrote:
> > > rsmith wrote:
> > > > You also need to check the reverse condition and return false (the "or if not that" is ... somehow ... supposed to imply that).
> > > Hmm. So I'm wondering what is intended by the language `F1 and F2 are rewritten candidates, and F2 is a synthesized candidate with reversed order of parameters and F1 is not`. For example, what happens when comparing two distinct member functions with only one explicit parameter?
> > > 
> > > ```
> > > struct T;
> > > struct U { auto operator<=>(T); };
> > > struct T { auto operator<=>(U); };
> > > auto r = T{} < U{}; // Are the synthesized and rewritten overloads ambiguous? 
> > > ```
> > And what about 
> > 
> > ```
> > struct U {};
> > struct T { auto operator<=>(U); };
> > auto operator<=>(U, T);
> > auto r = T{} < U{};
> > ```
> Nevermind. I found the language. The implicit object parameter isn't considered in this case. 
The parameters in scope in this rule are the parameters for the purpose of overload resolution, which include the implicit object parameter for a member function.

In your first example, both candidates have implicit conversion sequences with Exact Match rank for both parameters (and none of the ordering special cases apply), so we fall down to this tie-breaker and it selects the `T::operator<=>(U)` candidate, unless I'm missing something.

Your second example appears to receive the same treatment.

================
Comment at: lib/Sema/SemaOverload.cpp:12537-12565
+/// Rewritten candidates have been added but not checked for validity. They
+/// could still be non-viable if:
+///  (A) The rewritten call (x <=> y) is a builtin, but it will be ill-formed
+///      when built (for example it has narrowing conversions).
+///  (B) The expression (x <=> y) @ 0 is ill-formed for the result of (x <=> y).
+///
+/// If either is the case, this function should be considered non-viable and
----------------
EricWF wrote:
> EricWF wrote:
> > rsmith wrote:
> > > Hmm, this is an interesting approach. It's quite a divergence from our usual approach, which is to perform the viability checks first, before partial ordering, even if we could eliminate some candidates based on partial ordering. And it's observable -- it will result in fewer template instantiations being performed, which will mean we do not diagnose some ill-formed (no diagnostic required) situations which would otherwise fail due to an error in a non-immediate context.
> > > 
> > > Can you say something about why you chose this approach instead of doing the lookup for operator `@` early and marking the candidate non-viable if the lookup / overload resolution for it fails?
> > > 
> > > (I'm also vaguely wondering whether we should be pushing back on the C++ committee for making viability of the rewritten candidate depend on validity of the `@` expression, not just the `<=>` expression.)
> > The main reason for choosing this approach was the current expense of validating the candidates eagerly when they're added. Currently builtin candidates need to be fully built before their viability is known. My initial attempt at this made the compiler just about hang.
> > 
> > If I understand what you mean by performing lookup/overload resolution early for operator `@`, and I probably don't, then it would require performing overload resolution separately every rewritten candidate added, since each candidate could have a different return type.
> > 
> > I think rewritten builtins could be checked more eagerly by moving a bunch of the checking from `SemaExpr.cpp` to `SemaOverload.cpp`, and using it to eagerly check the bits we need to, but not actually building the expression as calling into `SemaExpr.cpp` would currently do.
> > 
> > As for the root question: Should the validity of the `@` expression be considered during overload resolution? I think the answer is tricky. For example, the following case could easily occur and break existing code, especially as more users add defaulted comparison operators. In this case if the base class adds <=> without the derived classes knowledge.
> > 
> > ```
> > struct B {
> >   using CallbackTy = void(*)();
> >   CallbackTy CB = nullptr; // B carries around a function pointer.
> > #if __cplusplus >= 201703L
> >   friend auto operator<=>(B const&, B const&) = default;
> > #else
> >   friend bool operator==(B, B);
> >   friend bool operator!=(B, B);
> > #endif
> > };
> > struct D {
> >   operator int() const; // D uses this conversion to provide comparisons.
> > };
> > D d;
> > // comparison ill-formed if the `@` expression isn't considered for viability.
> > auto r = d < d;
> > ```
> > However, allowing the `@` expression to effect viability leads to a different set of bugs. For example this slicing comparison bug:
> > ```
> > struct B {
> >   int x;
> >   friend auto operator<=>(B, B) = default;
> > };
> > struct D : B {
> >   decltype(nullptr) y = {};
> >   friend auto operator<=>(D, D) = default;
> > };
> > D d;
> > // If the `@` expression is considered for viability, then `operator<=>(B, B)` will be used, effectively slicing the data used during the comparison. 
> > auto r = d < d; 
> > ```
> > 
> > What is clear is that it should go back to CWG for more discussion. 
> > 
> > It's quite a divergence from our usual approach, which is to perform the viability checks first, before partial ordering, even if we could eliminate some candidates based on partial ordering. And it's observable -- it will result in fewer template instantiations being performed, which will mean we do not diagnose some ill-formed (no diagnostic required) situations which would otherwise fail due to an error in a non-immediate context.
> 
> I should clarify that the same validity checks are applied to rewritten candidates are also applied to non-rewritten candidates, such as checking conversions to parameter types. So I suspect, for all intents and purposes, it should produce the same amount of template instantiations as existing operator lookup does. 
I'm not sure that it's true in general that we get the same amount of instantiation with this approach. For example, checking the `@` expression will trigger instantiation of the return type of the `operator<=>`, and may trigger instantiation of the `operator<=>` function itself in some cases. (I think this is a conforming technique; I think the more interesting question is whether it's a good thing. I believe GCC uses techniques like this, and it does create portability problems in practice for code written against GCC, for example -- but on the other hand, they presumably spend less time in overload resolution as a result.)

================
Comment at: lib/Sema/SemaOverload.cpp:12570-12572
+  case OR_Deleted:
+    // FIXME(EricWF): If we've found a deleted rewritten operator, it's
+    // possible we should have never considered it a viable candidate.
----------------
EricWF wrote:
> rsmith wrote:
> > This is a really interesting situation. What does it even mean to consider the `(a <=> b) @ c` expression if lookup for `<=>` finds a deleted function? (It would be novel for the return type of a deleted function to be meaningful, and it would be novel to exclude an overload candidate because it's deleted.) And what if lookup for `@` finds a deleted function?
> > 
> > I think this is good reason to go back to CWG and suggest that we don't check the `@` expression at all until after overload resolution.
> > What does it even mean to consider the (a <=> b) @ c expression if lookup for <=> finds a deleted function?
> 
> Separate from allowing return types to affect overload resolution, I suspect selecting a deleted rewritten candidate means that we should have never considered the candidate in the first place. For example:
> 
> ```
> struct B { auto operator<=>(B) const = delete; };
> struct D : private B { operator int() const; };
> D{} < D{};
> ```
> 
> In the above case, the intent of the code seems clear. But because the compiler invented a `<=>` candidate with a better conversion ranking, our code is ill-formed. Therefore, it makes sense to prefer non-rewritten candidates over deleted rewritten ones.
> 
> That said, it certainly doesn't make sense to "make way" for a worse rewritten candidate when the best viable function is deleted. For example:
> 
> ```
> struct Any { Any(...); };
> auto operator<=>(Any, Any);
> struct T { friend auto operator<=>(T, T) = delete; };
> T{} < T{}; // selecting operator<=>(Any, Any) would be bad. 
> ```
I'm not sure I see clear intent in your first example. `B` is saying "any attempt to compare me is ill-formed". It's not clear to me what `D`'s intent when privately inheriting from `B` is. But if the `operator<=>` were not deleted, we would[1] have selected it (and such selection would even be valid if done in a context in which `B` is an accessible base class of `D`) -- so I think that program should be rejected.

 [1]: Well, here's the problem. We would have selected it if it returned a type that supports `< 0`. And that's not a meaningful question to ask. I suppose we could argue that if the "= delete" were simply removed, we'd have rejected it due to trying to use a function with a deduced return type before the return type is available.

Repository:
  rC Clang

https://reviews.llvm.org/D45680