<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;">
<div>I thought I’d chip in and describe Eclipse CDT’s strategy with header caching. It’s actually a big cheat but the results have proven to be pretty good.</div>
<div><br>
</div>
<div>CDT’s hack actually starts in the preprocessor. If we see a header file has already been indexed, we skip including it. At the back end, we seamlessly use the index or the current symbol table when doing symbol lookup. Symbols that get missed because we
skipped header files get picked up out of the index instead. We also do that in the preprocessor to look up missing macros out of the index when doing macro substitution.</div>
<div><br>
</div>
<div>The performance gains were about an order of magnitude and it magically works most of the time with the main issue being header files that get included multiple times affected by different macro values but the effects of that haven’t been major.</div>
<div><br>
</div>
<div>With clang being a real compiler, I had my doubts that you could even do something like this without adding hooks in places the front-end gang might not like. Love to be proven wrong. It really is very hard to keep up with the evolving C++ standard and
we could sure use the help clangd could offer.</div>
<div><br>
</div>
<div>Hope that helps,</div>
<div>Doug.</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style="font-weight:bold">From: </span>cfe-dev <<a href="mailto:cfe-dev-bounces@lists.llvm.org">cfe-dev-bounces@lists.llvm.org</a>> on behalf of Ilya Biryukov via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>><br>
<span style="font-weight:bold">Reply-To: </span>Ilya Biryukov <<a href="mailto:ibiryukov@google.com">ibiryukov@google.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, June 1, 2017 at 10:52 AM<br>
<span style="font-weight:bold">To: </span>Vladimir Voskresensky <<a href="mailto:vladimir.voskresensky@oracle.com">vladimir.voskresensky@oracle.com</a>><br>
<span style="font-weight:bold">Cc: </span>via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>><br>
<span style="font-weight:bold">Subject: </span>Re: [cfe-dev] Adding indexing support to Clangd<br>
</div>
<div><br>
</div>
<blockquote id="MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT: #b5c4df 5 solid; PADDING:0 0 0 5; MARGIN:0 0 0 5;">
<div>
<div>
<div dir="ltr">Thanks for the insights, I think I get the gist of the idea with the "module" PCH.
<div>One question is: what if the system headers are included after the user includes? Then we abandon the PCH cache and run the parsing from scratch, right?</div>
<div><br>
</div>
<div>
<div><span style="font-size:12.8px">FileSystemStatCache that is reused between compilation units? Sounds like a low-hanging fruit for indexing, thanks.</span><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jun 1, 2017 at 11:52 AM, Vladimir Voskresensky <span dir="ltr">
<<a href="mailto:vladimir.voskresensky@oracle.com" target="_blank">vladimir.voskresensky@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">Hi Ilia,<br>
<br>
Sorry for the late reply.<br>
Unfortunately mentioned hacks were done long time ago and I couldn't find the changes at the first glance :-(<br>
<br>
But you can think about reusable chaned PCHs in the "module" way.<br>
Each system header is a module. <br>
There are special index_headers.c and index_headers.cpp files which includes all standard headers.<br>
These files are indexed first and create "module" per #include.<br>
Module is created once or several times if preprocessor contexts are very different like C vs. C++98 vs. C++14.<br>
Then reused.<br>
Of course it could compromise the accuracy, but for proof of concept was enough to see that expected indexing speed can be achieved theoretically.
<br>
<br>
Btw, another hint: implementing FileSystemStatCache gave the next visible speedup. Of course need to carefully invalidate/update it when file was modified in IDE or externally.<br>
So, finally we got just 2x slowdown, but the accuracy of "real" compiler. And then as you know we have started Clank :-)<br>
<br>
Hope it helps,<br>
Vladimir.
<div>
<div class="h5"><br>
<br>
<div class="m_5048487057408778332moz-cite-prefix">On 29.05.2017 11:58, Ilya Biryukov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Vladimir,
<div><br>
</div>
<div>Thanks for sharing your experience.</div>
<div><br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">We did such measurements when evaluated clang as a technology to be used in NetBeans C/C++, I don't remember the exact absolute numbers now, but the conclusion was: </div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">to be on par with the existing NetBeans speed we have to use different caching, otherwise it was like 10 times slower.</div>
</blockquote>
<div>It's a good reason to focus on that issue from the very start than. Would be nice to have some exact measurements, though. (i.e. on LLVM).</div>
<div>Just to know how slow exactly was it.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">+1. Btw, may be It is worth to set some expectations what is available during and after initial index phase.<br>
I.e. during initial phase you'd probably like to have navigation for file opened in editor and can work in functions bodies.<br>
</div>
</blockquote>
<div>We definitely want diagnostics/completions for the currently open file to be available. Good point, we definitely want to explicitly name the available features in the docs/discussions.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">As to initial indexing:<br>
Using PTH (not PCH) gave significant speedup.</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">Skipping bodies gave significant speedup, but you miss the references and later have to reindex bodies on demand.<br>
Using chainged PCH gave the next visible speedup.<br>
</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">Of course we had to made some hacks for PCHs to be more often "reusable" (comparing to strict compiler rule) and keep multiple versions. In average 2: one for C and one for C++ parse context.<br>
Also there is a difference between system headers and projects headers, so systems' can be cached more aggressively.
<br>
</div>
</blockquote>
<div>Is this work open-source? The interesting part is how to "reuse" the PCH for a header that's included in a different order. </div>
<div>I.e. is there a way to reuse some cached information(PCH, or anything else) for <map> and <vector> when parsing these two files:<br>
</div>
<div>```</div>
<div>// foo.cpp</div>
<div>#include <vector></div>
<div>#include <map></div>
<div>...</div>
<div><br>
</div>
<div>// bar.cpp</div>
<div>#include <map></div>
<div>#include <vector></div>
<div>....</div>
<div>```</div>
</div>
<div><br>
</div>
-- <br>
<div class="m_5048487057408778332gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Regards,</div>
<div>Ilya Biryukov</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Regards,</div>
<div>Ilya Biryukov</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</span>
</body>
</html>