[cfe-dev] Adding indexing support to Clangd

Thu Jun 1 02:52:05 PDT 2017

Hi Ilia,

Sorry for the late reply.
Unfortunately mentioned hacks were done long time ago and I couldn't 
find the changes at the first glance :-(

But you can think about reusable chaned PCHs in the "module" way.
Each system header is a module.
There are special index_headers.c and index_headers.cpp files which 
includes all standard headers.
These files are indexed first and create "module" per #include.
Module is created once or several times if preprocessor contexts are 
very different like C vs. C++98 vs. C++14.
Then reused.
Of course it could compromise the accuracy, but for proof of concept was 
enough to see that expected indexing speed can be achieved theoretically.

Btw, another hint: implementing FileSystemStatCache gave the next 
visible speedup. Of course need to carefully invalidate/update it when 
file was modified in IDE or externally.
So, finally we got just 2x slowdown, but the accuracy of "real" 
compiler. And then as you know we have started Clank :-)

Hope it helps,
Vladimir.

On 29.05.2017 11:58, Ilya Biryukov wrote:
> Hi Vladimir,
>
> Thanks for sharing your experience.
>
>     We did such measurements when evaluated clang as a technology to
>     be used in NetBeans C/C++, I don't remember the exact absolute
>     numbers now, but the conclusion was:
>
>     to be on par with the existing NetBeans speed we have to use
>     different caching, otherwise it was like 10 times slower.
>
> It's a good reason to focus on that issue from the very start than. 
> Would be nice to have some exact measurements, though. (i.e. on LLVM).
> Just to know how slow exactly was it.
>
>     +1. Btw, may be It is worth to set some expectations what is
>     available during and after initial index phase.
>     I.e. during initial phase you'd probably like to have navigation
>     for file opened in editor and can work in functions bodies.
>
> We definitely want diagnostics/completions for the currently open file 
> to be available. Good point, we definitely want to explicitly name the 
> available features in the docs/discussions.
>
>     As to initial indexing:
>     Using PTH (not PCH) gave significant speedup.
>
>     Skipping bodies gave significant speedup, but you miss the
>     references and later have to reindex bodies on demand.
>     Using chainged PCH gave the next visible speedup.
>
>     Of course we had to made some hacks for PCHs to be more often
>     "reusable" (comparing to strict compiler rule) and keep multiple
>     versions. In average 2: one for C and one for C++ parse context.
>     Also there is a difference between system headers and projects
>     headers, so systems' can be cached more aggressively.
>
> Is this work open-source? The interesting part is how to "reuse" the 
> PCH for a header that's included in a different order.
> I.e. is there a way to reuse some cached information(PCH, or anything 
> else) for <map> and <vector> when parsing these two files:
> ```
> // foo.cpp
> #include <vector>
> #include <map>
> ...
>
> // bar.cpp
> #include <map>
> #include <vector>
> ....
> ```
>
> -- 
> Regards,
> Ilya Biryukov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170601/5c93f187/attachment.html>