[cfe-dev] Adding indexing support to Clangd

Marc-André Laperle via cfe-dev cfe-dev at lists.llvm.org
Thu May 18 13:30:50 PDT 2017


Hi Alex! Some replies in-lined.

On Thu, 2017-05-18 at 12:30 +0100, Alex L wrote:
Hi,

Thanks for making a summary of existing solutions!

On 17 May 2017 at 23:38, Marc-André Laperle via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
Hi,

I’ve been thinking about how to add features to Clangd requiring an index, i.e. features that need a database containing information of all source files (Go to definition, find references, etc). I’d like to share with you my thoughts on how things are and what approaches could be taken before getting too deep into implementing something.

My understanding of the current Clang indexing facilities is as follow:
  - It is part of the libclang so it is meant to have a stable API which can be limiting because it does not expose the full Clang C/C++ API
  - It does not have persistence. I.e. the index cannot be reloaded from disk at a later time after it is built.
  - There is no header caching mechanism in order to allow faster reparsing when a source file changes but its included headers haven’t (a common occurrence during code editing).

Have you looked into the precompiled preamble? I believe it can (and is) used when indexing.

I haven't really looked into it yet but it looks very useful, especially this section: https://clang.llvm.org/docs/PCHInternals.html#chained-precompiled-headers



-- Other indexing solutions --

I have done a very high level exploration of some other projects using Clang for indexing, you can find some notes here:
https://docs.google.com/document/d/1Z0pDZpUlhyRkw1yB9frVVeb_xgSb5PuXD0-aeUtKkpo/edit?usp=sharing
(Feel free to add your own notes if you’d like!)

From what I gathered:
  - Some projects are using libclang, others use the Clang C++ APIs (AST) directly because of libclang limitations
  - Some projects have a custom index formats on disk, others use RDMS (PostgreSql, Sqlite) or other already available solutions (Elastic Search, etc).
  - I didn’t notice any projects based on Clang doing header caching, although perhaps I missed it. Ilya Biryukov wrote that JetBrains CLion does header caching but it’s not clear how they are stored or if it is using Clang.

IIRC CLion uses a custom C++ parser instead of Clang.

On the Eclipse CDT side, Clang is not used but there is header caching by storing the semantic model in the index (not plain AST). Then the source files can be parsed reusing that cached information.

Possible approach for Clangd:
  - I suggest using Clang libraries directly and not using libclang in order to not have any limitations. I think that using a stable API is not as important since Clangd resides in the same tree and is built and tested in coordination with Clang itself. The downside is that it will not reuse some of the work already done in libclang such as finding references, etc.

I agree, Clangd should not use libclang. Note that in general libclang's indexer API is intended to be a wrapper around the core implementation in lib/Index. I also don't think libclang doesn't expose any means to find references.

I would encourage Clangd to reuse existing code in lib/Index. Even though it has bugs, we are (and will be) currently fixing a lot of issues in the library to ensure that our consumer records all of the possible declarations and references for both C++ and Obj-C.

Thanks, I was under the wrong impression that this was all part of libclang but I see that this is not the case. I'm all for reusing code and I can help fix issues if there are any. I'll give this a try!

-- The persisted file format --

All elements in the model mentioned above could have a querying interface which could be implemented for an “in memory” database (simpler to debug and fast for small projects) and also for an on-disk database. From my experience in Eclipse CDT, the index on disk was stored in the form of a BTree which worked quite well. The BTree is made out of chunks. Chunks can be cached in memory and fetched from disk as required. Every information in the model is fetched from the database (from cache otherwise from disk). A similar approach could be used for Clangd if it’s deemed suitable.

Have you looked into LLVM's bitcode as a possible format for the persistent index? Clang currently uses it for serialized diagnostics and modules.


I will have a look. It seems very well defined. It's not clear to me yet if this can be used across the board but I'll play around with it a bit.

Thank you so much for mentioning these things! It's easy to miss some of the useful parts when getting into a new code base.

Regards,
Marc-André




In summary, I’m proposing for Clangd an index on disk stored in the form of a BTree that is populated using Clang’s C++ API (not libclang). Any concerns or input would be greatly appreciated. Just as a side note, I’m aware that this is just one line of thinking and others could be considered.

Best regards,
Marc-André Laperle
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170518/6c306963/attachment.html>


More information about the cfe-dev mailing list