[PATCH] D65043: [Format] Add C++20 standard to style options

Wed Jul 24 10:49:22 PDT 2019

Quuxplusone added a comment.

In D65043#1599220 <https://reviews.llvm.org/D65043#1599220>, @sammccall wrote:

> Clang actually defaults to c++14 since clang 6. (CompilerInvocation.cpp near `CLANG_DEFAULT_STD_CXX`)

Today I learned. Thanks! :)

> It's important that clang-format works sensibly with minimal configuration on as much code as possible.
>  This means both:
> 
> - when running on pre-11 code that uses `vector<vector<int> >` consistently, it should preserve that c++03-compatible syntax
> - when running on post-11 code that uses `vector<vector<int>>` in some places, it should fix the formatting to use it everywhere This behavior is an important default, and doesn't correspond to any one version of C++.

Ah, I see. That use-case makes sense to me. So, the observable behavior of what you describe, would that be the same behavior as "//first// scan the file and auto-detect its version, //then// format it exactly as if that version had been explicitly specified"? I don't see how to achieve the behavior you described unless you start by scanning the file for ad-hoc instances of "C++11isms" such as `vector<vector<int>>`. If you find at least one such instance, then you're in the "post-11" case; otherwise you're (maybe) in the "pre-11" case. But that scanning process has basically no overlap with the formatting process, right? You //might// want to add another scan for ad-hoc "C++2a-isms" such as `co_yield -1;` or maybe something due to `operator<=>` or maybe something due to P0634 "Down with `typename`" <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0634r3.html> or maybe something due to `constinit`. (I say "maybe" because all those examples seem extremely contrived to me.)

IMHO, AFAIK, C++2a has sufficient backward-compatibility in the parser, so that there does not //need// to be any difference between how clang-format formats C++11 code and how clang-format formats C++2a code. It is true that token-soups such as `f(co_yield - 1);` have different parse trees in '11 and '2a, but I don't think any sane user should complain if clang-format formats that soup as `f(co_yield -1)` even in C++11 mode.  But C++2a has a //very// large footprint, and I may well be failing to imagine something that a sane user //would// complain about.

The situation with `co_yield` seems analogous to how I imagine you must currently deal with fold-expressions in C++11/17 code. If you're in "Cpp11" mode, and you see `(xs - ... - 1)`, you don't reformat it to `(xs -... -1)` merely because C++11 lacks fold-expressions, do you?

================
Comment at: clang/docs/ClangFormatStyleOptions.rst:2227
+  * ``LS_Cpp20`` (in configuration: ``Cpp20``)
+    Use features of C++20 and C++2a (e.g.: treating ``co_yield`` as a keyword,
+    not an identifier, so ``co_yield++ i`` is formatted as ``co_yield ++i``).
----------------
modocache wrote:
> Quuxplusone wrote:
> > C++2a //will// be C++20, barring any radically unforeseen events. So saying "C++20 and C++2a" is redundant. Personally I would follow GCC/Clang's lead and say "C++2a" until the standard is actually out.
> Thank you, I'll do that. I wasn't sure about the naming conventions. While we're talking about this, mind if I ask: why did "C++1z" use "z" whereas "C++2a" use "a"? Once C++20 is actually out, is the next version "C++2b", or does "C++2a" start referring to C++23? Sorry for the rookie questions.
The convention started with C++0x, the standard that followed C++03. (As 2009 passed, the joke became that "0x" must stand for something hexadecimal.) Eventually, C++0x turned into C++11. So then the next version was C++1y (since "y" comes after "x"), which became C++14. The version after that was C++1z, which became C++17. The version after C++17 is called "C++2a" because from the beginning it was targeting a 2020 release date (hence the "2"), and because "a" comes after "z" if you're used to modular arithmetic. As a special mnemonic bonus, "C++1z" happened to be the last C++ release of the 2010s, and "C++2a" will be the first C++ release of the 2020s.
If the convention continues — and I see no reason it shouldn't — then C++2b will be '23 and C++2c will be '26. (That is, if WG21 keeps to its grueling three-year release cycle. I hope and pray that they won't, but, we'll see.)

================
Comment at: clang/lib/Format/Format.cpp:2373
   LangOpts.CPlusPlus17 = Style.Standard == FormatStyle::LS_Cpp03 ? 0 : 1;
-  LangOpts.CPlusPlus2a = Style.Standard == FormatStyle::LS_Cpp03 ? 0 : 1;
+  LangOpts.CPlusPlus2a = Style.Standard == FormatStyle::LS_Cpp20 ? 1 : 0;
   LangOpts.LineComment = 1;
----------------
Incidentally, these four lines seem like a great place to use `Style.Standard >= FormatStyle::LS_CppWhatever` (with a cast if necessary), unless that's against some style rule.

================
Comment at: clang/unittests/Format/FormatTest.cpp:13815
+  verifyFormat("co_yield++ i;");
+  verifyFormat("co_yield ++i;", Cpp20);
+
----------------
If you're going to test C++11's behavior here, please use `co_yield - 1;` or something else that might reasonably appear in a C++11 program. `co_yield++ i;` is not valid C++ (unless `i` is a macro, I guess).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65043/new/

https://reviews.llvm.org/D65043