<div dir="ltr">Thanks for the insights, I think I get the gist of the idea with the "module" PCH. <div>One question is: what if the system headers are included after the user includes? Then we abandon the PCH cache and run the parsing from scratch, right?</div><div><br></div><div><div><span style="font-size:12.8px">FileSystemStatCache that is reused between compilation units? Sounds like a low-hanging fruit for indexing, thanks.</span><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 1, 2017 at 11:52 AM, Vladimir Voskresensky <span dir="ltr"><<a href="mailto:vladimir.voskresensky@oracle.com" target="_blank">vladimir.voskresensky@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Hi Ilia,<br>

    <br>

    Sorry for the late reply.<br>

    Unfortunately mentioned hacks were done long time ago and I couldn't

    find the changes at the first glance :-(<br>

    <br>

    But you can think about reusable chaned PCHs in the "module" way.<br>

    Each system header is a module. <br>

    There are special index_headers.c and index_headers.cpp files which

    includes all standard headers.<br>

    These files are indexed first and create "module" per #include.<br>

    Module is created once or several times if preprocessor contexts are

    very different like C vs. C++98 vs. C++14.<br>

    Then reused.<br>

    Of course it could compromise the accuracy, but for proof of concept

    was enough to see that expected indexing speed can be achieved

    theoretically. <br>

    <br>

    Btw, another hint: implementing FileSystemStatCache gave the next

    visible speedup. Of course need to carefully invalidate/update it

    when file was modified in IDE or externally.<br>

    So, finally we got just 2x slowdown, but the accuracy of "real"

    compiler. And then as you know we have started Clank :-)<br>

    <br>

    Hope it helps,<br>

    Vladimir.<div><div class="h5"><br>

    <br>

    <div class="m_5048487057408778332moz-cite-prefix">On 29.05.2017 11:58, Ilya Biryukov

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi Vladimir,

        <div><br>

        </div>

        <div>Thanks for sharing your experience.</div>

        <div><br>

          <div class="gmail_extra">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF">We did such measurements when

                  evaluated clang as a technology to be used in NetBeans

                  C/C++, I don't remember the exact absolute numbers

                  now, but the conclusion was: </div>

              </blockquote>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF"> to be on par with the existing

                  NetBeans speed we have to use different caching,

                  otherwise it was like 10 times slower.</div>

              </blockquote>

              <div>It's a good reason to focus on that issue from the

                very start than. Would be nice to have some exact

                measurements, though. (i.e. on LLVM).</div>

              <div>Just to know how slow exactly was it.</div>

              <div><br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF"> +1. Btw, may be It is worth to

                  set some expectations what is available during and

                  after initial index phase.<br>

                  I.e. during initial phase you'd probably like to have

                  navigation for file opened in editor and can work in

                  functions bodies.<br>

                </div>

              </blockquote>

              <div>We definitely want diagnostics/completions for the

                currently open file to be available. Good point, we

                definitely want to explicitly name the available

                features in the docs/discussions.</div>

              <div><br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF">As to initial indexing:<br>

                  Using PTH (not PCH) gave significant speedup.</div>

              </blockquote>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF"> Skipping bodies gave significant

                  speedup, but you miss the references and later have to

                  reindex bodies on demand.<br>

                  Using chainged PCH gave the next visible speedup.<br>

                </div>

              </blockquote>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div bgcolor="#FFFFFF">Of course we had to made some

                  hacks for PCHs to be more often "reusable" (comparing

                  to strict compiler rule) and keep multiple versions.

                  In average 2: one for C and one for C++ parse context.<br>

                  Also there is a difference between system headers and

                  projects headers, so systems' can be cached more

                  aggressively. <br>

                </div>

              </blockquote>

              <div>Is this work open-source? The interesting part is how

                to "reuse" the PCH for a header that's included in a

                different order. </div>

              <div>I.e. is there a way to reuse some cached

                information(PCH, or anything else) for <map> and

                <vector> when parsing these two files:<br>

              </div>

              <div>```</div>

              <div>// foo.cpp</div>

              <div>#include <vector></div>

              <div>#include <map></div>

              <div>...</div>

              <div><br>

              </div>

              <div>// bar.cpp</div>

              <div>#include <map></div>

              <div>#include <vector></div>

              <div>....</div>

              <div>```</div>

            </div>

            <div><br>

            </div>

            -- <br>

            <div class="m_5048487057408778332gmail_signature">

              <div dir="ltr">

                <div>

                  <div dir="ltr">

                    <div>Regards,</div>

                    <div>Ilya Biryukov</div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>

</div>