[cfe-dev] Adding indexing support to Clangd
Alex L via cfe-dev
cfe-dev at lists.llvm.org
Thu May 18 04:30:30 PDT 2017
Thanks for making a summary of existing solutions!
On 17 May 2017 at 23:38, Marc-André Laperle via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> I’ve been thinking about how to add features to Clangd requiring an index,
> i.e. features that need a database containing information of all source
> files (Go to definition, find references, etc). I’d like to share with you
> my thoughts on how things are and what approaches could be taken before
> getting too deep into implementing something.
> My understanding of the current Clang indexing facilities is as follow:
> - It is part of the libclang so it is meant to have a stable API which
> can be limiting because it does not expose the full Clang C/C++ API
> - It does not have persistence. I.e. the index cannot be reloaded from
> disk at a later time after it is built.
> - There is no header caching mechanism in order to allow faster
> reparsing when a source file changes but its included headers haven’t (a
> common occurrence during code editing).
Have you looked into the precompiled preamble? I believe it can (and is)
used when indexing.
> -- Other indexing solutions --
> I have done a very high level exploration of some other projects using
> Clang for indexing, you can find some notes here:
> (Feel free to add your own notes if you’d like!)
> From what I gathered:
> - Some projects are using libclang, others use the Clang C++ APIs (AST)
> directly because of libclang limitations
> - Some projects have a custom index formats on disk, others use RDMS
> (PostgreSql, Sqlite) or other already available solutions (Elastic Search,
> - I didn’t notice any projects based on Clang doing header caching,
> although perhaps I missed it. Ilya Biryukov wrote that JetBrains CLion does
> header caching but it’s not clear how they are stored or if it is using
IIRC CLion uses a custom C++ parser instead of Clang.
> On the Eclipse CDT side, Clang is not used but there is header caching by
> storing the semantic model in the index (not plain AST). Then the source
> files can be parsed reusing that cached information.
> Possible approach for Clangd:
> - I suggest using Clang libraries directly and not using libclang in
> order to not have any limitations. I think that using a stable API is not
> as important since Clangd resides in the same tree and is built and tested
> in coordination with Clang itself. The downside is that it will not reuse
> some of the work already done in libclang such as finding references, etc.
I agree, Clangd should not use libclang. Note that in general libclang's
indexer API is intended to be a wrapper around the core implementation in
lib/Index. I also don't think libclang doesn't expose any means to find
I would encourage Clangd to reuse existing code in lib/Index. Even though
it has bugs, we are (and will be) currently fixing a lot of issues in the
library to ensure that our consumer records all of the possible
declarations and references for both C++ and Obj-C.
> - I think introducing a big dependency such as PostgreSql is not
> acceptable for Clangd (correct me if I’m wrong!). So a custom tailored file
> format for the index make more sense to me.
> - For header caching, I wonder if it is possible to reuse the
> precompiled header support in Clang. There would be some logic that would
> decide whether or not a precompiled header could be used depending on the
> preprocessing context (same macro definitions, etc).
> -- The Index model --
> Here’s what the data model could look like. For sure it’s partial and I
> expect it will evolve quite a bit. But it should be enough to communicate
> the general idea.
> Index: Represents the model of the code base as a whole
> - IndexFile 
> IndexFile: Represents an indexed file
> - URI path
> - IndexFile includedBy [ ]
> - IndexName [ ]
> - Last modified timestamp, checksum, etc
> IndexName: Represents a declaration, definition, macro, etc
> - Source Location
> - IndexReference [ ]
> IndexNameReference: Reference to a name
> - Source Location
> - Access (read, write, read/write)
> IndexTypeName extends IndexName: represents classes, structs, etc
> - IndexTypeName bases [ ]
> IndexFunctionName extends IndexName: represents functions, methods, etc
> - IndexFunctionName callers [ ]
> Note that a lot of information probably doesn’t need to be modeled because
> a lot of information only needs to be available with an opened file for
> which we can have access to the full AST.
> -- The persisted file format --
> All elements in the model mentioned above could have a querying interface
> which could be implemented for an “in memory” database (simpler to debug
> and fast for small projects) and also for an on-disk database. From my
> experience in Eclipse CDT, the index on disk was stored in the form of a
> BTree which worked quite well. The BTree is made out of chunks. Chunks can
> be cached in memory and fetched from disk as required. Every information in
> the model is fetched from the database (from cache otherwise from disk). A
> similar approach could be used for Clangd if it’s deemed suitable.
Have you looked into LLVM's bitcode as a possible format for the persistent
index? Clang currently uses it for serialized diagnostics and modules.
> In summary, I’m proposing for Clangd an index on disk stored in the form
> of a BTree that is populated using Clang’s C++ API (not libclang). Any
> concerns or input would be greatly appreciated. Just as a side note, I’m
> aware that this is just one line of thinking and others could be considered.
> Best regards,
> Marc-André Laperle
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev