[PATCH] D44720: [clangd] Simplify fuzzy matcher (sequence alignment) by removing some condition checks.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 3 12:02:49 PDT 2018


sammccall added a comment.

Thanks for your work on this patch!
I think there a several useful improvements here, which I'd love to see landed. Particularly the logic around matches that end early is much better.

There are also places that change existing design decisions in ways that don't seem to be clear improvements: doing extra work (albeit minor) at `match()` time, and using different naming conventions for indexes in `dumpLast`. I see you have strong feelings about these and I do think I understand your arguments, but don't agree as discussed above.

Where should we go from here? One option is to land the pieces we agree on. But it's your patch, and if you'd like an opinion from another clangd maintainer that's fine too; happy to help with that.



================
Comment at: clangd/FuzzyMatch.cpp:230
 void FuzzyMatcher::buildGraph() {
+  Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};
   for (int W = 0; W < WordN; ++W) {
----------------
MaskRay wrote:
> MaskRay wrote:
> > sammccall wrote:
> > > MaskRay wrote:
> > > > sammccall wrote:
> > > > > why this change?
> > > > > this has also been moved from the cheaper constructor to the more expensive per-match call. (also the diagonal assignment added in the next loop)
> > > > > 
> > > > > Also, shouldn't [0][0][Match] be AwfulScore?
> > > > > 
> > > > "The more expensive per-match call" is just two value assignments.
> > > > 
> > > > I have removed the expensive table initialization in the constructor.
> > > > 
> > > > [0][0][*] can be any value.
> > > > "The more expensive per-match call" is just two value assignments.
> > > Oops, sorry - by "more expensive" I mean "called thousands of times more often".
> > > 
> > > > I have removed the expensive table initialization in the constructor.
> > > I don't want to be rude, but I asked why you changed this, and you didn't answer. Unless there's a strong reason, I'd prefer to revert this change, as I find this harder to reason about.
> > > (Roughly: in the old version of the code, any data that didn't need to change for the life of the object was initialized in the constructor. That way I didn't need to worry what was performance-critical and what wasn't - match() only did what was strictly necessary).
> > > 
> > > > [0][0][*] can be any value.
> > > Can you please explain why?
> > > Oops, sorry - by "more expensive" I mean "called thousands of times more often".
> > 
> > It is true that `Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};` is the cost incurred for each word.
> > 
> > But **it is not full table initialization**, it is just two variable assignments. And we will assign to other values of the first row `Scores[0][*][*]` in the following loop. The old scatters the table construction to **two places**, the constructor and this dynamic programming site.
> > [0][0][*] can be any value.
> 
> Can you please explain why?
> 
> `Scores[0][0][*]` is the initial value which will be propagated to all other values in the table.
> The relative difference of pairwise values in the table is a constant whatever initial value is chosen.
> 
> If you ignore the max clamp you used later, the initial value does not matter.
> 
> 
Is it not possible that we'll choose a best path starting at Scores[0][0][Match]?
This is invalid, and previously that was signaled by giving that cell AwfulScore, which ensures any path that emerges from it isAwful.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44720





More information about the cfe-commits mailing list