[PATCH] D75194: [clang-format] Do not merge very long C# automatic properties

Krasimir Georgiev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 3 12:37:17 PST 2020


krasimir added a comment.

Here's some empirical ideas about the approach: In clang-format, braces can be handled in two quite distinct ways, controlled by BraceBlockKind <https://github.com/llvm/llvm-project/blob/9c4afce7024aa1fe4efe2631240d1d766b4ea257/clang/lib/Format/FormatToken.h#L123>: BK_Block and BK_BracedInit.
BK_Block is for braces that open blocks that usually are at one level deeper and consist of a sequence of statements and other constructs.
BK_BracedInit is for initializer lists, dictionaries and other similar syntactics that are somewhat more appropriate to put together into a line.
The level of granularity of detailed formatting in clang-format is an unwrapped line, which is conceptually a sequence of tokens that "make sense together as a single line" (without going much into style details and ignoring column limits). Separate statements are for example put on separate unwrapped lines. The formatting information flowing between unwrapped lines includes just higher-level data such as the current nesting depth.
The brace detection is handled primarily in UnwrappedLineParser::calculateBraceTypes <https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L421>, which happens quite early during the parsing of the input sequence.

If an opening brace is marked BK_Block there, later the stuff between it and the matching closing brace is parsed as a block: multiple "statements" are put on their own unrwapped lines, inside the block.
If the brace is marked BK_BracedInit, the staff following it is parsed more like dictionary-struct-array-literal stuff, and importantly is kept on the same unwrapped line as the surrounding code (as a braced list may occur as a subexpression of a larger expression, and the formatting of the larger expression may depend non-trivially by the formatting of the braced list).

For example, consider the following pseudo-C-family fragment:

  % cat test.cc                                          
  
  int f() {
    block_example {
      get;
      set;
    };
  
    int init_list_example({1, 2, {more}}, other);
  }

If we examine the parsed unwrapped lines (`clang-format -debug`), they look like:

  Line(0, FSC=0): int[T=81, OC=0] identifier[T=81, OC=4] l_paren[T=81, OC=5] r_paren[T=81, OC=6] l_brace[T=21, OC=8] 
  Line(1, FSC=0): identifier[T=81, OC=2] l_brace[T=21, OC=16] 
  Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
  Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
  Line(1, FSC=0): r_brace[T=81, OC=2] semi[T=81, OC=3] 
  Line(1, FSC=0): int[T=81, OC=2] identifier[T=81, OC=6] l_paren[T=81, OC=23] l_brace[T=81, OC=24] numeric_constant[T=81, OC=25] comma[T=81, OC=26] numeric_constant[T=81, OC=28] comma[T=81, OC=29] l_brace[T=81, OC=31] identifier[T=81, OC=32] r_brace[T=81, OC=36] r_brace[T=81, OC=37] comma[T=81, OC=38] identifier[T=81, OC=40] r_paren[T=81, OC=45] semi[T=81, OC=46] 
  Line(0, FSC=0): r_brace[T=81, OC=0] 
  Line(0, FSC=0): eof[T=81, OC=0] 

The block-like thing is put on separate lines at level 2; the braced-list-like thing is kept on the same unwrapped line.

Currently, C# auto property get/set "blocks" are parsed as BK_Block, hence the need to later merge lines using a heuristic.
If you could instead mark those as BK_BracedInit in `calculateBraceTypes`, that will have the effect of keeping them inline with the surrounding code and might produce good formatting of the whole line with a few tweaks.
There's another complication: syntactically in C++, a semicolon is very special. For example, C/C++ requires a semicolon after class declarations. There's a bunch of places in clang-formatthat use the presence of semicolons to determine "end of statement/end of line". So the semicolons in `{ get; set; }` might cause a bit of trouble and throw the parser off a bit. Fortunately, clang-format already has some code that deals with similar complications for javascript, where stuff like `X<{a: string; b: number}>` is correctly handled as a braced list, even in the presence of semicolons inside. You can look at how this is handled for inspiration (I'm not 100% sure these are the only places in code that contribute to the formatting of these constructs):

- in calculateBraceTypes here: https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L446
- and later in parseBracedList the semicolons are specially treated here: https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L1698

I hope this is helpful, but please take it with a grain of salt as I'm not very familiar with those parts of clang-format and am mostly poking around it and looking at how similar constructs for other languages are handled.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75194/new/

https://reviews.llvm.org/D75194





More information about the cfe-commits mailing list