[cfe-dev] RFC: Symbol index for Clangd design proposal

Eric Liu via cfe-dev cfe-dev at lists.llvm.org
Tue Jul 17 14:01:06 PDT 2018

On Tue, Jul 17, 2018 at 8:11 PM Marc-André Laperle via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi Kirill,
> Thanks a lot for posting this proposal! I have a few questions that are
> maybe a bit more high-level. About the "static index" mentioned, when would
> it be updated? If I remember correctly, I think for Google the static index
> might be a remote server and I assume it would be updated periodically when
> new commits are applied on the repo? Just making sure I understand where
> your use case. When you mentioned the proposal would be implemented by the
> end of September, would that still use the YAML as the static index storage?
Note that Kirill's design is trying to address the problem of serving
collected symbols efficiently (i.e. implementating SymbolIndex) instead of
"indexing"/collecting symbols for static index.

One of the main goals here is to make it work well for both small dynamic
index and large static index. And there should be no restriction on how
symbols are collected.  For clangd, static index is just a "SymbolIndex"
that does not change within the span of a clangd instance. The symbols can
come from the YAML global-symbol-builder or from index-while-build, and we
don't expect the symbol index implementation to change dramatically across
different scenarios. So to answer your question, yes, the short term plan
is to continue using the offline-built YAML symbol table
(global-symbol-buider) as the symbol source for the static index :) The
yaml stuff is experimental, and we would also like to move to
index-while-build in the future.

> If not, what did you have in mind for a more typical usage of Clangd on a
> single machine? Would the static index be the the unit and record files
> (index-while-building)?

We were thinking along those lines: when a file is changed and saved,
> Clangd starts a background indexing task and updates the corresponding
> unit/record files. Unsaved files would be the dynamic index.
This sounds like a good optimization that can be useful in
index-while-build integration.

> Now, the index of "USR to record-files" (and other global-level info)
> could be generated when Clangd is started by reading all unit/record files
> and then kept in memory. I haven't done measurements on how fast that would
> be, but judging from the presentation last fall [1], it was taking a few
> seconds on LLVM/Clang. I could imagine this taking a few minutes every time
> Clangd is started on a bigger code base. So next step would be to persist
> that index to disk for a greater speed-up, using LMDB or similar.
Another interesting problem with index-while-build is how to build a full
symbol index (with both fuzzy find and USR->record lookup support) quickly
for symbols from all TUs, when a new clangd instance is started. From our
experience with global-symbol-builder, merging symbols across all TUs can
be very expensive (>20 mins for LLVM/Clang!). YAML may contribute to some
slowness here, and I would expect bit-format files used in
index-while-build to speed up serialization/deserialization. But merging
symbols from TUs can still be slow, so we might end up needing persistent
storage for merged symbols and/or the symbol index. Obviously, this has to
be measured when we have index-while-build.

> By defining well the interface for the static index, I think it should be
> possible to support both scenarios (local/index-while-building vs remote).
> For the local scenario with background indexing, I had made a rough
> prototype many months ago in order to just do basic testing of the "index
> while building" patches. We would like to join the effort in providing that
> kind of functionality to Clangd but it is not clear how to proceed. I am
> thinking, in the short term, we could help getting the
> "index-while-building" patches reviewed and accepted. But it would be good
> to make sure we are heading in the same direction and coordinate on what
> needs to be done.
Using index-while-building as the source of static index is also our long
term goal, so we would definitely like to get aligned and also help in
getting the index-while-build patches landed. I think there will be a large
design space to integrate index-while-build into clangd, and a dedicated
design doc would probably be a good starting point. But I think the
index-while-building effort is not in the scope of Kirill's design, which
should focus on designing a performant symbol index for all scenarios.



> Regards,
> Marc-André
> [1] <https://www.youtube.com/watch?v=jGJhnIT-D2M>
> https://youtu.be/jGJhnIT-D2M?t=940
> <https://www.youtube.com/watch?v=jGJhnIT-D2M>
> ------------------------------
> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of Kirill
> Bobyrev via cfe-dev <cfe-dev at lists.llvm.org>
> *Sent:* Monday, July 16, 2018 6:03:52 AM
> *To:* clangd-dev at lists.llvm.org
> *Cc:* Peter Collingbourne via cfe-dev
> *Subject:* [cfe-dev] RFC: Symbol index for Clangd design proposal
> Dear LLVM Community,
> over the past few weeks, we (Google C++ Language Tools Team) have been
> working on the efficient symbol index proposal for Clangd. The goal is to
> improve overall Clangd performance by reducing the latency of different
> kinds of symbol search queries, such as the ones used for code completion.
> The plan is to follow the proposed design and replace existing
> implementation by the end of September.
> We are happy to get feedback and comments on the proposal: suggestions are
> welcome!
> The link to design document:
> https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit?usp=sharing
> Kind regards,
> Kirill Bobyrev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180717/e83d01b3/attachment.html>

More information about the cfe-dev mailing list