<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 06/ 1/17 06:26 PM, Ilya Biryukov via
cfe-dev wrote:<br>
</div>
<blockquote
cite="mid:CANmbtFchigu+JH4qxvJT918w4hpYoSo=eB3T+QUMEi9o3Ogtmg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>Other IDEs do that very similarly to CDT, AFAIK.
Compromising correctness, but getting better performance.</div>
<div>Reusing modules would be nice, and I wonder if it could
also be made transparent to the users of the tool (i.e. we
could have an option 'pretend these headers are modules every
time you encounter them')<br>
</div>
<div>I would expect that to break on most projects, though. Not
sure if people would be willing to use something that spits
tons of errors on them.</div>
<div>Interesting direction for prototyping...</div>
</div>
</blockquote>
As Doug mentioned, surprisingly the tricks with headers in the
majority of projects give pretty good results :-)<br>
<br>
In NetBeans we have similar to CDT headers caching approach.<br>
<br>
The only difference is that when we hit #include the second time we
only check if we can skip indexing,<br>
But we always do "fair lightweight preprocessing" to keep fair
context of all possible inner #ifdef/#else/#define directives
(because they might affect the current file).<br>
For that we use APT (Abstract Preprocessor Tree) per-file which is
constant for the file and is created once - similar to clang's PTH
(Pre-Tokenized headers).<br>
<br>
Visiting file's APT we can produce different output based on input
preprocessor state.<br>
It can be visited in "light" mode or "produce tokens" mode, but it
is always gives correct result from the strict compiler point of
view.<br>
We also do indexing in parallel and the APT (being immutable) is
easily shared by index-visitors from all threads.<br>
Btw stat cache is also reused from all indexing threads with
appropriate synchronizations.<br>
<br>
So in NetBeans we observe that using this tricks (which really looks
like multi-modules per header file) the majority of projects are in
very good accuracy + I can also confirm that it gives ~10x speedup.<br>
<br>
Hope it helps,<br>
Vladimir.<br>
<br>
<blockquote
cite="mid:CANmbtFchigu+JH4qxvJT918w4hpYoSo=eB3T+QUMEi9o3Ogtmg@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jun 1, 2017 at 5:14 PM, David
Blaikie <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Not sure this has already been discussed, but
would it be practical/reasonable to use Clang's modules
support for this? Might keep the implementation much
simpler - and perhaps provide an extra incentive for users
to modularize their build/code which would help their
actual build tymes (& heck, parsed modules could even
potentially be reused between indexer and final build -
making apparent build times /really/ fast)</div>
<br>
<div class="gmail_quote">
<div>
<div class="h5">
<div dir="ltr">On Thu, Jun 1, 2017 at 8:12 AM Doug
Schaefer via cfe-dev <<a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a>>
wrote:<br>
</div>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="h5">
<div
style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>I thought I’d chip in and describe Eclipse
CDT’s strategy with header caching. It’s
actually a big cheat but the results have proven
to be pretty good.</div>
<div><br>
</div>
<div>CDT’s hack actually starts in the
preprocessor. If we see a header file has
already been indexed, we skip including it. At
the back end, we seamlessly use the index or the
current symbol table when doing symbol lookup.
Symbols that get missed because we skipped
header files get picked up out of the index
instead. We also do that in the preprocessor to
look up missing macros out of the index when
doing macro substitution.</div>
<div><br>
</div>
<div>The performance gains were about an order of
magnitude and it magically works most of the
time with the main issue being header files that
get included multiple times affected by
different macro values but the effects of that
haven’t been major.</div>
<div><br>
</div>
<div>With clang being a real compiler, I had my
doubts that you could even do something like
this without adding hooks in places the
front-end gang might not like. Love to be proven
wrong. It really is very hard to keep up with
the evolving C++ standard and we could sure use
the help clangd could offer.</div>
<div><br>
</div>
<div>Hope that helps,</div>
<div>Doug.</div>
<div><br>
</div>
<span
id="m_8883615856890106395m_4236914162035532794OLK_SRC_BODY_SECTION">
<div
style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium
none;BORDER-LEFT:medium
none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df
1pt solid;BORDER-RIGHT:medium
none;PADDING-TOP:3pt">
<span style="font-weight:bold">From: </span>cfe-dev
<<a moz-do-not-send="true"
href="mailto:cfe-dev-bounces@lists.llvm.org"
target="_blank">cfe-dev-bounces@lists.llvm.<wbr>org</a>>
on behalf of Ilya Biryukov via cfe-dev <<a
moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a>><br>
<span style="font-weight:bold">Reply-To: </span>Ilya
Biryukov <<a moz-do-not-send="true"
href="mailto:ibiryukov@google.com"
target="_blank">ibiryukov@google.com</a>><br>
<span style="font-weight:bold">Date: </span>Thursday,
June 1, 2017 at 10:52 AM<br>
<span style="font-weight:bold">To: </span>Vladimir
Voskresensky <<a moz-do-not-send="true"
href="mailto:vladimir.voskresensky@oracle.com"
target="_blank">vladimir.voskresensky@oracle.<wbr>com</a>><br>
<span style="font-weight:bold">Cc: </span>via
cfe-dev <<a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a>></div>
</span></div>
<div
style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><span
id="m_8883615856890106395m_4236914162035532794OLK_SRC_BODY_SECTION">
<div
style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium
none;BORDER-LEFT:medium
none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df
1pt solid;BORDER-RIGHT:medium
none;PADDING-TOP:3pt"><br>
<span style="font-weight:bold">Subject: </span>Re:
[cfe-dev] Adding indexing support to Clangd<br>
</div>
</span></div>
<div
style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><span
id="m_8883615856890106395m_4236914162035532794OLK_SRC_BODY_SECTION">
<div><br>
</div>
<blockquote
id="m_8883615856890106395m_4236914162035532794MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE"
style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0
0 5;MARGIN:0 0 0 5">
<div>
<div>
<div dir="ltr">Thanks for the insights, I
think I get the gist of the idea with
the "module" PCH.
<div>One question is: what if the system
headers are included after the user
includes? Then we abandon the PCH
cache and run the parsing from
scratch, right?</div>
<div><br>
</div>
<div>
<div><span style="font-size:12.8px">FileSystemStatCache
that is reused between compilation
units? Sounds like a low-hanging
fruit for indexing, thanks.</span><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jun 1,
2017 at 11:52 AM, Vladimir
Voskresensky <span dir="ltr">
<<a moz-do-not-send="true"
href="mailto:vladimir.voskresensky@oracle.com"
target="_blank">vladimir.voskresensky@oracle.<wbr>com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000">Hi Ilia,<br>
<br>
Sorry for the late reply.<br>
Unfortunately mentioned hacks were
done long time ago and I couldn't
find the changes at the first
glance :-(<br>
<br>
But you can think about reusable
chaned PCHs in the "module" way.<br>
Each system header is a module. <br>
There are special index_headers.c
and index_headers.cpp files which
includes all standard headers.<br>
These files are indexed first and
create "module" per #include.<br>
Module is created once or several
times if preprocessor contexts are
very different like C vs. C++98
vs. C++14.<br>
Then reused.<br>
Of course it could compromise the
accuracy, but for proof of concept
was enough to see that expected
indexing speed can be achieved
theoretically.
<br>
<br>
Btw, another hint: implementing
FileSystemStatCache gave the next
visible speedup. Of course need to
carefully invalidate/update it
when file was modified in IDE or
externally.<br>
So, finally we got just 2x
slowdown, but the accuracy of
"real" compiler. And then as you
know we have started Clank :-)<br>
<br>
Hope it helps,<br>
Vladimir.
<div>
<div
class="m_8883615856890106395m_4236914162035532794h5"><br>
<br>
<div
class="m_8883615856890106395m_4236914162035532794m_5048487057408778332moz-cite-prefix">On
29.05.2017 11:58, Ilya
Biryukov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Vladimir,
<div><br>
</div>
<div>Thanks for sharing
your experience.</div>
<div><br>
<div class="gmail_extra">
<div
class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">We
did such
measurements
when evaluated
clang as a
technology to be
used in NetBeans
C/C++, I don't
remember the
exact absolute
numbers now, but
the conclusion
was: </div>
</blockquote>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">to
be on par with
the existing
NetBeans speed
we have to use
different
caching,
otherwise it was
like 10 times
slower.</div>
</blockquote>
<div>It's a good
reason to focus on
that issue from
the very start
than. Would be
nice to have some
exact
measurements,
though. (i.e. on
LLVM).</div>
<div>Just to know
how slow exactly
was it.</div>
<div><br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">+1.
Btw, may be It
is worth to set
some
expectations
what is
available during
and after
initial index
phase.<br>
I.e. during
initial phase
you'd probably
like to have
navigation for
file opened in
editor and can
work in
functions
bodies.<br>
</div>
</blockquote>
<div>We definitely
want
diagnostics/completions
for the currently
open file to be
available. Good
point, we
definitely want to
explicitly name
the available
features in the
docs/discussions.</div>
<div><br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">As
to initial
indexing:<br>
Using PTH (not
PCH) gave
significant
speedup.</div>
</blockquote>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">Skipping
bodies gave
significant
speedup, but you
miss the
references and
later have to
reindex bodies
on demand.<br>
Using chainged
PCH gave the
next visible
speedup.<br>
</div>
</blockquote>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div
bgcolor="#FFFFFF">Of
course we had to
made some hacks
for PCHs to be
more often
"reusable"
(comparing to
strict compiler
rule) and keep
multiple
versions. In
average 2: one
for C and one
for C++ parse
context.<br>
Also there is a
difference
between system
headers and
projects
headers, so
systems' can be
cached more
aggressively.
<br>
</div>
</blockquote>
<div>Is this work
open-source? The
interesting part
is how to "reuse"
the PCH for a
header that's
included in a
different order. </div>
<div>I.e. is there a
way to reuse some
cached
information(PCH,
or anything else)
for <map>
and <vector>
when parsing these
two files:<br>
</div>
<div>```</div>
<div>// foo.cpp</div>
<div>#include
<vector></div>
<div>#include
<map></div>
<div>...</div>
<div><br>
</div>
<div>// bar.cpp</div>
<div>#include
<map></div>
<div>#include
<vector></div>
<div>....</div>
<div>```</div>
</div>
<div><br>
</div>
-- <br>
<div
class="m_8883615856890106395m_4236914162035532794m_5048487057408778332gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Regards,</div>
<div>Ilya
Biryukov</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div
class="m_8883615856890106395m_4236914162035532794gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Regards,</div>
<div>Ilya Biryukov</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</span></div>
</div>
</div>
______________________________<wbr>_________________<span
class=""><br>
cfe-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
</span></blockquote>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>Regards,</div>
<div>Ilya Biryukov</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
cfe-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>