[PATCH] D44720: [clangd] Simplify fuzzy matcher (sequence alignment) by removing some condition checks.

Thu Mar 29 11:30:37 PDT 2018

MaskRay added inline comments.

================
Comment at: clangd/FuzzyMatch.cpp:340
+  A[WordN] = Miss;
+  for (int I = WordN, P = PatN; I > 0; I--) {
+    if (I == W)
----------------
sammccall wrote:
> MaskRay wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > sammccall wrote:
> > > > > W is the right name in this file for a variable iterating over word indices, please don't change this.
> > > > > The new variable above could be EndW or so?
> > > > As far as I can see, this loop is setting `A[W+1:...] = Miss` and populating `A[0...W]` with the exsting logic.
> > > > I think this would be clearer as two loops, currently there's a lot of conditionals around Last that obscure what's actually happening.
> > > You've shifted P (and the old W, new I) by 1. This does reduce the number of +1 and -1 in this function, but it's inconsistent with how these are used elsewhere: P should be the index into Pat of the character that we're considering.
> > I don't understand the rationale not to use the shifted indices. The code actually use `Scores[P][W][*]` to mean the optimal match of the first `P` characters of the pattern with the first `W` characters of the word, not the position of the character.
> > 
> > On the other hand, C++ reverse iterators use the shifted one for `for (I = rend(); I != rbegin(); ++I)`. The shifted one makes ending condition check easier.
> > I don't understand the rationale not to use the shifted indices
> The rationale is entirely consistency with the surrounding code. The consistency helps avoid off-by-one errors when similar loops have different conventions.
> 
> In this file, when looping over word or pattern dimensions, P and W respectively are used for loop variables, and can be interpreted as indices into Pat/Word.
> Here the interpretation would be "did we match or miss character Word[W]"
`Scores[P][W][*]` is interpreted as how good it is if we align the first `P` characters of the pattern with the first `W` characters of the word.

Note the code uses `number of characters` instead of the position.

Here the new interpretation would be "what we should do for the last character of the first W characters"

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44720