[PATCH] D33589: clang-format: consider not splitting tokens in optimization

Thu Jul 13 05:17:06 PDT 2017

Typz added inline comments.

================
Comment at: lib/Format/UnwrappedLineFormatter.cpp:723
       FormatDecision LastFormat = Node->State.NextToken->Decision;
       if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+        addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
----------------
djasper wrote:
> This is almost certainly a no-go. It turns the algorithm from exploring a state space with a branching factor of 2 into something with a branching factor of 4.
> 
> Even assuming we can perfectly reuse results from computations (or in other words hit the same state through different path using Dijkstra's algorithm), the additional bool in StateNode doubles the number of states we are going to visit. You don't even seem to make a distinction of whether the token under investigation can possibly be split. You do look at NoLineBreak(InOperand), but even if those are false, the vast majority of tokens will never be split.
> 
> However, even if you narrow that down, I am not convinced that fixing this inconsistency around the formatting penalties is worth the runtime performance hit. I am generally fine with changing the behavior in the way you are proposing, but only if it has zero (negative) effect on performance.
Making the distinction to skip some path is done at the beginning of addNextStateToQueue(), though indeed not very precisely at the moment.

I can improve the check (i.e. by copying all the early return conditions from the beginning of `ContinuationIndenter::breakProtrudingToken()`, which will greatly reduce the number of possible state, but stilll have a small impact.

The alternative would be modify ContinuationIndenter::breakProtrudingToken(), so that it computes the penalty for reflowing as well as not-reflowing, and updates the LineState with the best solution. It would certainly have an impact on performance, but would not increase the size of the state space.

Another issue with that approach is that the information is not passed from the DRYRUN phase to the actual rewriting phase. I could store this information in the LineState, to re-use it in the reconstruction phase, but this seems really wrong (and would work only in the exact case of the optimizing line formatter).

What do you think? Should I keep the same structure but reduce the number of explored state; or move the decision into ContinuationIndenter, possibly storing the result in LineState?

================
Comment at: lib/Format/UnwrappedLineFormatter.cpp:764
       return;
+    if (!BreakToken && !Indenter->canSplit(PreviousNode->State))
+      return;
----------------
djasper wrote:
> It's not clear to me why you'd return here.
This is needed to avoid trying to break when this is not supported: no point doing twice the same thing.

https://reviews.llvm.org/D33589