[PATCH] D106528: [clangd] Improve performance of dex by 45-60%

Thu Jul 22 14:06:26 PDT 2021

kbobyrev added inline comments.

================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:91
   /// sync.
   void sync() {
     ReachedEnd |= Children.front()->reachedEnd();
----------------
sammccall wrote:
> I stared at this for a while and this change makes me think about the whole algo differently.
> 
> Effectively, we're just iterating through the items C of front trying to find a valid sync point.
> With the slight optimization that when some other child skips over C to X, we catch up the front to X.
> 
> This leads to two possible simplifications (strictly, unrelated to this patch and totally up to you):
>  
> 1. The initial setup and outer loop has unclear bounds - it's basically a loop over the items of Children.front() but that's not clear.
> 
> Consider rather:
> 
> ```
> auto &Front = *Children.front(); // smallest
> while ((ReachedEnd |= Front->reachedEnd())) {
>   auto SyncID = Front.peek();
>   bool ValidSync = true;
>   for (auto &Child : Children {
>     ...
>     if (Child->peek() > SyncID) {
>       Front->advanceTo(Child->peek()); // XXX
>       ValidSync = false;
>       break;
>     }
>     ...
>   }
>   if (ValidSync)
>     return;
> }
> ```
> 
> This seems basically equivalent and I find the flow control easier to reason about.
> 
> 2. Is the "slight optimization" important? i.e. is `Front->advanceTo(Child->peek())` actually a significant win over `Front->advance()`? I think the flow control is even clearer in this version:
> 
> ```
> for (auto &Front = *Children.front() ;(ReachedEnd |= Front->reachedEnd()); Front.advance()) {
>   auto SyncID = Front.peek();
>   bool ValidSync = true;
>   for (auto &Child : Children {
>     ...
>     if (Child->peek() > SyncID) {
>       ValidSync = false;
>       break;
>     }
>     ...
>   }
>   if (ValidSync)
>     return;
> }
> ```
> Because now the outer loop is *entirely* just a loop over the items of Front.
> 
> Intuitively the average number that advanceTo() advances by here is between 1 and 2 (we know it's at least 1, but the remaining items are mostly from the longer iterator so we probably don't skip over any from the shorter iterator).
> And advanceTo is more complicated, if advanceTo turns out to be 2x as expensive as advance in practice, this isn't a win. So this is maybe worth checking.
Re 1: this is really interesting. Even

```
/// Restores class invariants: each child will point to the same element after
/// sync.
void sync() {
  auto SyncID = Children.front()->peek();
  while (!(ReachedEnd |= Children.front()->reachedEnd())) {
    bool ValidSync = true;
    for (auto &Child : Children) {
      Child->advanceTo(SyncID);
      ReachedEnd |= Child->reachedEnd();
      // If any child reaches end And iterator can not match any other items.
      // In this case, just terminate the process.
      if (ReachedEnd)
        return;
      // If any child goes beyond given ID (i.e. ID is not the common item),
      // all children should be advanced to the next common item.
      if (Child->peek() > SyncID) {
        SyncID = Child->peek();
        ValidSync = false;
      }
    }
    if (ValidSync)
      break;
  }
}
```

Is 5% slower than what we have right now, even though it's almost equivalent to what is written.

And what you are proposing

```
void sync() {
  auto &Front = *Children.front(); // smallest
  while (!(ReachedEnd |= Children.front()->reachedEnd())) {
    bool ValidSync = true;
    auto SyncID = Front.peek();
    for (auto &Child : Children) {
      Child->advanceTo(SyncID);
      ReachedEnd |= Child->reachedEnd();
      // If any child reaches end And iterator can not match any other items.
      // In this case, just terminate the process.
      if (ReachedEnd)
        return;
      if (Child->peek() > SyncID) {
        Front.advanceTo(Child->peek()); // XXX
        ValidSync = false;
        break;
      }
    }
    if (ValidSync)
      break;
  }
}
```

Has exactly the same performance as we have right now (1/100th percent slower).

I gave it some thought and realized that the problem is that the code in suggestion does not cache values as the existing one does and incurs overhead. That made me realize I don't really cache `Child->peek()` on line 109 now even though I should. Doing that gives another 9% performance compared to the proposed version!

Using this knowledge, I tried to incorporate caching into your proposal and came up with something like this:

```
  void sync() {
    if ((ReachedEnd |= Children.front()->reachedEnd()))
      return;
    auto SyncID = Children.front()->peek();
    while (!ReachedEnd) {
      bool ValidSync = true;
      for (auto &Child : Children) {
        Child->advanceTo(SyncID);
        if ((ReachedEnd |= Child->reachedEnd()))
          return;
        auto Candidate = Child->peek();
        if (Candidate > SyncID) {
          SyncID = Candidate;
          ValidSync = false;
          break;
        }
      }
      if (ValidSync)
        break;
    }
  }
```

Weirdly enough, it still performs 7% slower compared to the current patch with `->peek()` caching inside the loop. I'm not sure why this is the case since the code looks almost identical :shrug:

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106528/new/

https://reviews.llvm.org/D106528