[PATCH] D44720: [clangd] Simplify fuzzy matcher (sequence alignment) by removing some condition checks.

Fangrui Song via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 29 11:25:21 PDT 2018


MaskRay added inline comments.


================
Comment at: clangd/FuzzyMatch.cpp:230
 void FuzzyMatcher::buildGraph() {
+  Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};
   for (int W = 0; W < WordN; ++W) {
----------------
MaskRay wrote:
> sammccall wrote:
> > MaskRay wrote:
> > > sammccall wrote:
> > > > why this change?
> > > > this has also been moved from the cheaper constructor to the more expensive per-match call. (also the diagonal assignment added in the next loop)
> > > > 
> > > > Also, shouldn't [0][0][Match] be AwfulScore?
> > > > 
> > > "The more expensive per-match call" is just two value assignments.
> > > 
> > > I have removed the expensive table initialization in the constructor.
> > > 
> > > [0][0][*] can be any value.
> > > "The more expensive per-match call" is just two value assignments.
> > Oops, sorry - by "more expensive" I mean "called thousands of times more often".
> > 
> > > I have removed the expensive table initialization in the constructor.
> > I don't want to be rude, but I asked why you changed this, and you didn't answer. Unless there's a strong reason, I'd prefer to revert this change, as I find this harder to reason about.
> > (Roughly: in the old version of the code, any data that didn't need to change for the life of the object was initialized in the constructor. That way I didn't need to worry what was performance-critical and what wasn't - match() only did what was strictly necessary).
> > 
> > > [0][0][*] can be any value.
> > Can you please explain why?
> > Oops, sorry - by "more expensive" I mean "called thousands of times more often".
> 
> It is true that `Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};` is the cost incurred for each word.
> 
> But **it is not full table initialization**, it is just two variable assignments. And we will assign to other values of the first row `Scores[0][*][*]` in the following loop. The old scatters the table construction to **two places**, the constructor and this dynamic programming site.
> [0][0][*] can be any value.

Can you please explain why?

`Scores[0][0][*]` is the initial value which will be propagated to all other values in the table.
The relative difference of pairwise values in the table is a constant whatever initial value is chosen.

If you ignore the max clamp you used later, the initial value does not matter.




Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44720





More information about the cfe-commits mailing list