[PATCH] D100778: [clang-format] Prevent extraneous space insertion in bitshift operators

Mon Apr 26 02:33:35 PDT 2021

krasimir added inline comments.

================
Comment at: clang/lib/Format/TokenAnnotator.cpp:125
+                 CurrentToken->Next->getStartOfNonWhitespace().getLocWithOffset(
+                     -1)))
           return false;
----------------
penagos wrote:
> krasimir wrote:
> > penagos wrote:
> > > MyDeveloperDay wrote:
> > > > I don't really understand what we are saying here? 
> > > Effectively we are checking that, barring intervening whitespace, we are analyzing 2 consecutive '>' tokens. If so, we treat such sequence as a binary op in lieu of a closing template angle bracket. If there's another more straightforward way of accomplishing this check, I'm open to that, but this seemed to be the most straightforward way at the time.
> > I'm worried that this may regress template code. How does this account for cases where two consecutive `>`-s are really two closing template brackets, e.g.,
> > `std::vector<std::decay_t<int& >> v;`?
> > 
> > In particular, one added test case is ambiguous: `>>` could really be two closing template brackets:
> > https://godbolt.org/z/v19hj9vKn
> > 
> > I have to say that my general feeling about trying to disambiguate between bitshifts and template closers is: don't try too hard inside clang-format as the heuristics are generally quite brittle and make the code harder to maintain; in cases where clang-format wrongly detects bitshift as templates, users should add parens around the bitshift, which IMO improves readability.
> As this patch currently stands, it does not disambiguate between bitshift '>>' operators and 2 closing template brackets, so in your snippet, we would no longer insert a space between the '>' characters (despite arguably being the better formatting decision in this case).
> 
> I agree with your feeling that user guided disambiguation between bitshift operators and template closing brackets via parens is the ideal solution and also improves readability, but IMO the approach taken by clang-format to format the '>' token should be conservative in that any change made should be non-semantic altering, which is not presently the case. While the case you mentioned would regress, we would no longer potentially alter program semantics. Thinking about this more, would it make sense to modify the actual white-space change generation later on in the analysis to not break up >> sequences of characters in lieu of annotating the tokens differently as the proposed patch is currently doing?
I tried and can't make this misinterpret two consecutive template `>` as a bit shift, IMO because this check is guarded by the `Left->ParentBracket != tok::less` condition. Both `std::vector<std::decay_t<int&>> v;` and `test<test<a | b>> c;` below are handled correctly.
I'm less worried about regressions in common template cases now.
Thank you for pointing out altering program semantics, I agree.
Please add a comment about this tradeoff and and a bit of the reasoning behind it in code for future reference.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100778/new/

https://reviews.llvm.org/D100778