[cfe-dev] RFC: Adding index-while-building support to Clang

Ilya Biryukov via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 31 01:26:28 PDT 2017


Hi Argyrios,

In our implementation we use LMDB (https://symas.com/lightning-
> memory-mapped-database). It is a key-value data-store that we use for
> cross-referencing queries, similarly to the example that Nathan provides in
> the document.
> Is this something that we could accept into the clang project (e.g. in
> clang-tools-extra) ? Note it is essentially a single header and
> implementation file.
>
AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate
the DB layer or it tightly coupled to the implementation?
If it's possible, we could include have DB-agnostic API in cfe or
clang-tools-extra and an alternative implementation of the storage layer.
+klimek, +bkramer, maybe you could comment on adding the new third-party
dependencies to LLVM? Is it possible?

2.  In clangd, we're not controlling the build step, instead building ASTs
> in-memory. We would rather store the indexing information in-memory or
> consume it on the go while building ASTs.
> Do you have suggestions on which parts of the API we should look at?
> We could implement our own IndexASTConsumer, but are there more
> opportunities for reusing other parts of your implementation? Code for
> collecting indexing dependencies, definitions of high-level record
> structures (i.e. symbol definitions, etc.)?
>
> There are a few ways to go about this:
> - Have ASTs in-memory, but indexing works on the file system. It’s not
> ideal but it is simple and works fairly well in practice, particularly
> since in our platform, files open in Xcode can be saved in disk even
> without having the user explicitly saving them.
>
- Update clang’s raw index data store using the in-memory buffers and ASTs.
> The simplicity is that symbol info comes from one place only, but there’s
> complexity in that you have raw data on disk that reflect in-memory-only
> sources.
>
- The layer on-top of clang's raw index data store is enhanced to treat the
> raw data on-disk as one source of symbol info, and in-memory ASTs as
> another. For example, if using LMDB, you could have it distinguish that
> info about a symbol comes from the raw data on-disk vs an in-memory AST.
>
Thanks. We probably want some combination of all options. We would
definitely benefit from reading the on-disk indexes. if they are there. But
those may be outdated, so we could our own indexing have a layer on top of
that for the modified files. Than we could dispatch all requests to both
layers and combine the results. Wonder if it's possible to make it work and
how much effort is it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170831/39a374aa/attachment.html>


More information about the cfe-dev mailing list