[PATCH] D49591: [clangd] Introduce search Tokens and trigram generation algorithms

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 20 04:42:41 PDT 2018


sammccall added inline comments.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:37
+
+  // Extract trigrams consisting of first characters of tokens sorted bytoken
+  // positions. Trigram generator is allowed to skip 1 word between each token.
----------------
sammccall wrote:
> ah, you're also handling this by cases here.
> 
> This doesn't seem necessary.
> In fact I think there's a fairly simple way to express this that works naturally with the segmentation structure produced by fuzzymatch (so doesn't need a version of segmentIdentifier to wrap it).
> But it's taking me too long to work out the details, will write a followup comment or discuss offline.
The idea is to precompute the list of "legal next chars" after each char.
This can in fact be done in one backwards pass. Then actually computing the trigram set is very easy.

```
generateTrigrams(StringRef Text) {
// Segment the text.
vector<CharRole> Roles(Text.size());
calculateRoles(Text, makeMutableArrayRef(Roles));

// Calculate legal next characters after each character.
vector<unsigned[3]> Next(Text.size()); // 0 entries mean "nothing"
unsigned NextTail = 0, NextSeg = 0, NextNextSeg = 0;
for (int I = Text.size() - 1; i >= 0; --I) {
  Next[I] = {NextTail, NextSeg, NextNextSeg};
  NextTail = Roles[I] == Tail ? I : 0;
  if (Roles[I] == Head) {
    NextNextHead = NextHead;
    NextHead = I;
  }
}

// Compute trigrams. They can start at head or tail chars.
DenseSet<Token> Tokens.
for (int I = 0; I < Text.size(); ++I) {
  if (Roles[I] != Head && Roles[I] != Tail) continue;
  for (unsigned J : Next[I]) {
    if (!J) continue;
    for (unsigned K : Next[K]) {
      if (!K) continue;
      char Trigram[] = {Text[I], Text[J], Text[K]};
      Tokens.emplate(TRIGRAM, Trigram);
    }
  }
}

// Return deduplicated trigrams.
std::vector<Token> TokensV;
for (const auto& E : Tokens) TokensV.push_back(E.second);
return TokensV;
```


https://reviews.llvm.org/D49591





More information about the cfe-commits mailing list