[PATCH] D106528: [clangd] Improve performance of dex by 45-60%

Thu Jul 22 07:11:06 PDT 2021

sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

Nice find!

================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:91
   /// sync.
   void sync() {
     ReachedEnd |= Children.front()->reachedEnd();
----------------
I stared at this for a while and this change makes me think about the whole algo differently.

Effectively, we're just iterating through the items C of front trying to find a valid sync point.
With the slight optimization that when some other child skips over C to X, we catch up the front to X.

This leads to two possible simplifications (strictly, unrelated to this patch and totally up to you):

1. The initial setup and outer loop has unclear bounds - it's basically a loop over the items of Children.front() but that's not clear.

Consider rather:

```
auto &Front = *Children.front(); // smallest
while ((ReachedEnd |= Front->reachedEnd())) {
  auto SyncID = Front.peek();
  bool ValidSync = true;
  for (auto &Child : Children {
    ...
    if (Child->peek() > SyncID) {
      Front->advanceTo(Child->peek()); // XXX
      ValidSync = false;
      break;
    }
    ...
  }
  if (ValidSync)
    return;
}
```

This seems basically equivalent and I find the flow control easier to reason about.

2. Is the "slight optimization" important? i.e. is `Front->advanceTo(Child->peek())` actually a significant win over `Front->advance()`? I think the flow control is even clearer in this version:

```
for (auto &Front = *Children.front() ;(ReachedEnd |= Front->reachedEnd()); Front.advance()) {
  auto SyncID = Front.peek();
  bool ValidSync = true;
  for (auto &Child : Children {
    ...
    if (Child->peek() > SyncID) {
      ValidSync = false;
      break;
    }
    ...
  }
  if (ValidSync)
    return;
}
```
Because now the outer loop is *entirely* just a loop over the items of Front.

Intuitively the average number that advanceTo() advances by here is between 1 and 2 (we know it's at least 1, but the remaining items are mostly from the longer iterator so we probably don't skip over any from the shorter iterator).
And advanceTo is more complicated, if advanceTo turns out to be 2x as expensive as advance in practice, this isn't a win. So this is maybe worth checking.

================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:112
           NeedsAdvance = true;
+          // Reset and make sure advanceTo happens much less frequently on
+          // large posting lists. This accounts for 45-60% performance boost.
----------------
First, say why:

Now we try again with a new sync point.
We always syncing with the first child as this is the cheapest (smallest). We only call advanceTo on the large posting lists once we have a fairly likely sync point.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106528/new/

https://reviews.llvm.org/D106528