[PATCH] D44720: [clangd] Simplify fuzzy matcher (sequence alignment) by removing some condition checks.

Thu Mar 29 08:58:30 PDT 2018

MaskRay added inline comments.

================
Comment at: clangd/FuzzyMatch.cpp:230
 void FuzzyMatcher::buildGraph() {
+  Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};
   for (int W = 0; W < WordN; ++W) {
----------------
sammccall wrote:
> why this change?
> this has also been moved from the cheaper constructor to the more expensive per-match call. (also the diagonal assignment added in the next loop)
> 
> Also, shouldn't [0][0][Match] be AwfulScore?
> 
"The more expensive per-match call" is just two value assignments.

I have removed the expensive table initialization in the constructor.

[0][0][*] can be any value.

================
Comment at: clangd/FuzzyMatch.cpp:325
+  int W = PatN;
+  for (int I = PatN; ++I <= WordN; )
+    if (Scores[PatN][I][Match].Score > Scores[PatN][W][Match].Score)
----------------
sammccall wrote:
> nit: I -> P, move increment to the increment expression of the for loop?
I -> P.

> move increment to the increment expression of the for loop?

Not sure about the coding standard here, but if you insist I'll have to change it as you are the reviewer. If the loop variable was an iterator, `for (It I = std::next(...); I != E; ++I)` would be uglier than `for (It I = ...; ++I != E; )`

================
Comment at: clangd/FuzzyMatch.cpp:340
+  A[WordN] = Miss;
+  for (int I = WordN, P = PatN; I > 0; I--) {
+    if (I == W)
----------------
sammccall wrote:
> sammccall wrote:
> > sammccall wrote:
> > > W is the right name in this file for a variable iterating over word indices, please don't change this.
> > > The new variable above could be EndW or so?
> > As far as I can see, this loop is setting `A[W+1:...] = Miss` and populating `A[0...W]` with the exsting logic.
> > I think this would be clearer as two loops, currently there's a lot of conditionals around Last that obscure what's actually happening.
> You've shifted P (and the old W, new I) by 1. This does reduce the number of +1 and -1 in this function, but it's inconsistent with how these are used elsewhere: P should be the index into Pat of the character that we're considering.
I don't understand the rationale not to use the shifted indices. The code actually use `Scores[P][W][*]` to mean the optimal match of the first `P` characters of the pattern with the first `W` characters of the word, not the position of the character.

On the other hand, C++ reverse iterators use the shifted one for `for (I = rend(); I != rbegin(); ++I)`. The shifted one makes ending condition check easier.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44720