[cfe-dev] Adding indexing support to Clangd

Alex L via cfe-dev cfe-dev at lists.llvm.org
Thu May 18 04:30:30 PDT 2017


Hi,

Thanks for making a summary of existing solutions!

On 17 May 2017 at 23:38, Marc-André Laperle via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi,
>
> I’ve been thinking about how to add features to Clangd requiring an index,
> i.e. features that need a database containing information of all source
> files (Go to definition, find references, etc). I’d like to share with you
> my thoughts on how things are and what approaches could be taken before
> getting too deep into implementing something.
>
> My understanding of the current Clang indexing facilities is as follow:
>   - It is part of the libclang so it is meant to have a stable API which
> can be limiting because it does not expose the full Clang C/C++ API
>   - It does not have persistence. I.e. the index cannot be reloaded from
> disk at a later time after it is built.
>   - There is no header caching mechanism in order to allow faster
> reparsing when a source file changes but its included headers haven’t (a
> common occurrence during code editing).
>

Have you looked into the precompiled preamble? I believe it can (and is)
used when indexing.


> -- Other indexing solutions --
>
> I have done a very high level exploration of some other projects using
> Clang for indexing, you can find some notes here:
> https://docs.google.com/document/d/1Z0pDZpUlhyRkw1yB9frVVeb_
> xgSb5PuXD0-aeUtKkpo/edit?usp=sharing
> (Feel free to add your own notes if you’d like!)
>
> From what I gathered:
>   - Some projects are using libclang, others use the Clang C++ APIs (AST)
> directly because of libclang limitations
>   - Some projects have a custom index formats on disk, others use RDMS
> (PostgreSql, Sqlite) or other already available solutions (Elastic Search,
> etc).
>   - I didn’t notice any projects based on Clang doing header caching,
> although perhaps I missed it. Ilya Biryukov wrote that JetBrains CLion does
> header caching but it’s not clear how they are stored or if it is using
> Clang.


IIRC CLion uses a custom C++ parser instead of Clang.


> On the Eclipse CDT side, Clang is not used but there is header caching by
> storing the semantic model in the index (not plain AST). Then the source
> files can be parsed reusing that cached information.
>
> Possible approach for Clangd:
>   - I suggest using Clang libraries directly and not using libclang in
> order to not have any limitations. I think that using a stable API is not
> as important since Clangd resides in the same tree and is built and tested
> in coordination with Clang itself. The downside is that it will not reuse
> some of the work already done in libclang such as finding references, etc.
>

I agree, Clangd should not use libclang. Note that in general libclang's
indexer API is intended to be a wrapper around the core implementation in
lib/Index. I also don't think libclang doesn't expose any means to find
references.

I would encourage Clangd to reuse existing code in lib/Index. Even though
it has bugs, we are (and will be) currently fixing a lot of issues in the
library to ensure that our consumer records all of the possible
declarations and references for both C++ and Obj-C.


>   - I think introducing a big dependency such as PostgreSql is not
> acceptable for Clangd (correct me if I’m wrong!). So a custom tailored file
> format for the index make more sense to me.
>   - For header caching, I wonder if it is possible to reuse the
> precompiled header support in Clang. There would be some logic that would
> decide whether or not a precompiled header could be used depending on the
> preprocessing context (same macro definitions, etc).


> -- The Index model --
>
> Here’s what the data model could look like. For sure it’s partial and I
> expect it will evolve quite a bit. But it should be enough to communicate
> the general idea.
>
> Index: Represents the model of the code base as a whole
>   - IndexFile []
>
> IndexFile: Represents an indexed file
>   - URI path
>   - IndexFile includedBy [ ]
>   - IndexName [ ]
>   - Last modified timestamp, checksum, etc
>
> IndexName: Represents a declaration, definition, macro, etc
>   - Source Location
>   - IndexReference [ ]
>
> IndexNameReference: Reference to a name
>   - Source Location
>   - Access (read, write, read/write)
>
> IndexTypeName extends IndexName: represents classes, structs, etc
>   - IndexTypeName bases [ ]
>
> IndexFunctionName extends IndexName: represents functions, methods, etc
>   - IndexFunctionName callers [ ]
>
> Note that a lot of information probably doesn’t need to be modeled because
> a lot of information only needs to be available with an opened file for
> which we can have access to the full AST.
>
> -- The persisted file format --
>
> All elements in the model mentioned above could have a querying interface
> which could be implemented for an “in memory” database (simpler to debug
> and fast for small projects) and also for an on-disk database. From my
> experience in Eclipse CDT, the index on disk was stored in the form of a
> BTree which worked quite well. The BTree is made out of chunks. Chunks can
> be cached in memory and fetched from disk as required. Every information in
> the model is fetched from the database (from cache otherwise from disk). A
> similar approach could be used for Clangd if it’s deemed suitable.
>

Have you looked into LLVM's bitcode as a possible format for the persistent
index? Clang currently uses it for serialized diagnostics and modules.


>
>
>
> In summary, I’m proposing for Clangd an index on disk stored in the form
> of a BTree that is populated using Clang’s C++ API (not libclang). Any
> concerns or input would be greatly appreciated. Just as a side note, I’m
> aware that this is just one line of thinking and others could be considered.
>
> Best regards,
> Marc-André Laperle
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170518/5fa5bda0/attachment.html>


More information about the cfe-dev mailing list