[PATCH] D87561: [Sema] List conversion validate character array

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Sep 27 17:19:31 PDT 2020


rsmith accepted this revision.
rsmith added a comment.
This revision is now accepted and ready to land.

Thanks, only nits here. Please feel free to submit after addressing them, or request another round of review if you prefer.



================
Comment at: clang/lib/Sema/SemaInit.cpp:3115-3118
+bool Sema::IsStringInit(Expr *Init, const ArrayType *AT) {
+  return ::IsStringInit(Init, AT, Context) == SIF_None;
+}
+
----------------
Consider moving this up to around line 144 (just after the definition of `::IsStringInit`).


================
Comment at: clang/lib/Sema/SemaOverload.cpp:4988
+
+    if (const auto *AT = S.Context.getAsArrayType(ToType))
+      if (S.IsStringInit(From->getInit(0), AT)) {
----------------
Nit: the `if` body is long, please add braces.


================
Comment at: clang/test/CXX/drs/dr14xx.cpp:338-340
+    void f1(int);
+    void f1(std::initializer_list<long>);
+    void g1() { f1({42}); }
----------------
It would be useful to also test that the right overload is used here, and for `g2`.


================
Comment at: clang/test/CXX/drs/dr14xx.cpp:424-431
+    f({"abc"});
+    f({((("abc")))});
+    f({L"abc"});
+#if __cplusplus >= 202002L
+    f({u8"abc"});
+#endif
+    f({uR"(abc)"});
----------------
Please test that the correct overload is called in each of these cases (especially considering the unintuitive outcome of these calls).


================
Comment at: clang/test/CXX/drs/dr14xx.cpp:411-414
+  void f(const char[4]);
+  void f(const wchar_t[4]);
+  void f(const char16_t[4]);
+  void f(const char32_t[4]);
----------------
Mordante wrote:
> rsmith wrote:
> > rsmith wrote:
> > > Mordante wrote:
> > > > rsmith wrote:
> > > > > These should presumably be references to arrays, rather than arrays, or the parameter type is as if you wrote (for example) `void f(const char *)`, which shouldn't get the special treatment here.
> > > > > 
> > > > > [over.ics.list]p4 mentions this in its footnote:
> > > > > 
> > > > > "Otherwise, if the parameter type is a character array [Footnote: Since there are no parameters of array type, this will only occur as the referenced type of a reference parameter.] and the initializer list has a single element that is an appropriately-typed string-literal (9.4.3), the implicit conversion sequence is the identity conversion."
> > > > Ah I missed that footnote. I reread the standard and can you confirm some cases?
> > > > ```
> > > > namespace A { 
> > > >   void f(const char(&)[4]);
> > > >   void g() { f({"abc"}); }
> > > > }
> > > > namespace B { 
> > > >   void f(const char(&&)[4]);
> > > >   void g() { f({"abc"}); }
> > > > } 
> > > > ```
> > > > both should work and with P0388 the array without bounds will also work.
> > > > 
> > > > I ask since this is my interpretation of the standard, but it seems there's a difference between implementations and `void f(const char(&&)[4]);` fails for "abc" but works for "ab".
> > > > It seems ICC and consider "abc" an lvalue in this case and not when using "ab".
> > > > 
> > > > Here's a gotbolt link with the examples https://godbolt.org/z/r1TKfx
> > > That's a really interesting example :)
> > > 
> > > The initializer is a list containing an lvalue of type `const char[4]`. Per [dcl.init.list]/3.9 and /3.10, the behavior depends on whether the referenced type is reference-related to `const char[4]` -- if so, then the reference can only bind directly and a `&&` reference will be invalid, because it's binding an rvalue reference to an lvalue, and if not, then we create an array temporary and the `&&` binding is fine.
> > > 
> > > Per [dcl.init.ref]/4, `const char[???]` is reference-related to `const char[4]` if they are similar types, and per [conv.qual]/2, the types are similar if `???` is omitted or `4`, and not similar otherwise.
> > > 
> > > So:
> > > * `const char (&&r)[4] = {"abc"}` is ill-formed (types are the same, binds rvalue reference to lvalue)
> > > * `const char (&&)[] = {"abc"}` is ill-formed (types are similar, binds rvalue reference to lvalue)
> > > * `const char (&&r)[5] = {"abc"}` is OK (types are not similar, creates temporary)
> > > 
> > > That seems like a very surprising outcome to me. Perhaps we should probably ignore array bounds entirely when determining whether two types are reference-related. I'll take this to CWG for discussion.
> > I think that's only an answer to half of your question. The other half is that [over.ics.list]p4 does not apply (directly) to either of your testcases, because the "parameter type" is a reference type, not an array type. For:
> > 
> > ```
> > namespace A { 
> >   void f(const char(&)[4]);
> >   void g() { f({"abc"}); }
> > }
> > ```
> > 
> > ... we reach [over.ics.list]p9, which says to use the rules in [over.ics.ref]. Those rules say that if the reference binds directly to an argument expression (ignore the "expression" here; this is very old wording that predates braced initializers), then we form an identity conversion. So that's what happens in this case.
> > 
> > For
> > 
> > ```
> > namespace B { 
> >   void f(const char(&&)[4]);
> >   void g() { f({"abc"}); }
> > }
> > ```
> > 
> > the same thing happens, but now [over.ics.ref]p3 says "an implicit conversion sequence cannot be formed if it requires binding an lvalue reference other than a reference to a non-volatile const type to an rvalue or binding an rvalue reference to an lvalue other than a function lvalue", so the candidate is not viable.
> > 
> > If the array bound were omitted or were not 4, then the reference would not bind directly, and we would instead consider initializing a temporary. [over.ics.ref]p2 says "When a parameter of reference type is not bound directly to an argument expression, the conversion sequence is the one required to convert the argument expression to the referenced type according to 12.4.4.2.", which takes us back around to [over.ics.list] with the "parameter type" now being the array type. That's how we can reach [over.ics.list]p4 and consider string literal -> array initialization.
> > 
> > So I think that, according to the current rules, for
> > 
> > ```
> > void f(const char (&&)[4]); // #1
> > void f(const char (&&)[5]); // #2
> > ```
> > 
> > ... a call to `f({"abc"})` remarkably calls #2, because #1 is not viable (would bind rvalue reference to lvalue string literal), but #2 is, just like `const char (&&r)[4] = {"abc"};` is ill-formed but `const char (&&r)[5] = {"abc"};` is valid.
> > That's a really interesting example :)
> 
> Thanks :-)
> Also thanks for the detailed explanation of the rules applying to these cases.
> 
> > So:
> > * `const char (&&r)[4] = {"abc"}` is ill-formed (types are the same, binds rvalue reference to lvalue)
> > * `const char (&&)[] = {"abc"}` is ill-formed (types are similar, binds rvalue reference to lvalue)
> 
> Interesting, wasn't P0388 trying to solve the issues with array of unknown bound?
> Am I correct to assume it's intended that this code will become valid in the future?
> 
> > * `const char (&&r)[5] = {"abc"}` is OK (types are not similar, creates temporary)
> > 
> > That seems like a very surprising outcome to me. Perhaps we should probably ignore array bounds entirely when determining whether two types are reference-related. I'll take this to CWG for discussion.
> 
> I agree this seems unexpected behaviour, thanks for discussing it with CWG.
> 
> > So I think that, according to the current rules, for
> 
> > void f(const char (&&)[4]); // #1
> > void f(const char (&&)[5]); // #2
> 
> > ... a call to f({"abc"}) remarkably calls #2, because #1 is not viable (would bind rvalue reference to lvalue string literal), but #2 is, just like const char (&&r)[4] = {"abc"}; is ill-formed but const char (&&r)[5] = {"abc"}; is valid.
> 
> MSVC agrees with your observation, GCC picks #1 and fails to create the call, and ICC aborts with an ICE
> https://godbolt.org/z/s7nTPP
> 
> 
> 
> Am I correct to assume it's intended that this code will become valid in the future?

I think it's more likely that the corresponding cases with other array bounds will become ill-formed. String literals are lvalues, so binding an rvalue reference to a string literal should not be valid based on our normal principles; allowing the wrong-array-bound case is then pretty surprising emergent behavior of the form that we usually disallow (on the basis that it "looks sufficiently like" the intention was for the reference to bind directly, so it's an error if the reference can't bind directly).

I think the likely outcomes are either that we ignore array bounds when determining reference-relatedness, or that we always create a temporary when binding an rvalue-reference-to-char-array to a string literal, regardless of whether the bound matches. Or maybe that we set the array bound of the string literal to that of the reference-to-array, allowing the `const &` cases but not the `&&` cases with an incorrect bound. But I wouldn't want to guess which.


================
Comment at: clang/test/SemaCXX/overload-array-size.cpp:3-5
+// When the array size is 4 the call will attempt to bind an lvalue to an
+// rvalue and fail. Therfore #2 will be called. (rsmith will bring this issue
+// to CWG)
----------------



================
Comment at: clang/test/SemaCXX/overload-array-size.cpp:6-13
+void f(const char(&&)[4]); // #1
+void f(const char(&&)[5]); // #2
+void g() {
+  f({"abc"});
+  // CHECK: ExprWithCleanups {{.*}} <line:[[@LINE-1]]:{{.*}}> 'void'
+  // CHECK-NEXT: CallExpr
+  // CHECK-NEXT: ImplicitCastExpr {{.*}} 'void (*)(const char (&&)[5])'
----------------
It's more common to test such things via diagnostics rather than by AST dumps. (AST dump tests tend to be more fragile in practice.) You could mark the `[5]` case as `=delete` and check that you get the "call to a deleted function" diagnostic, or you could change the `[5]` case to return `int &` and check that you can initialize an `int &` from a call to it, or something like that.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87561/new/

https://reviews.llvm.org/D87561



More information about the cfe-commits mailing list