<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>
      <blockquote type="cite">
        <pre wrap="">This sounds very like bdb.  Is there a reason that we’re reimplementing a large chunk of bdb, rather than just using it (or using something like sqlite for the index storage)?</pre>
      </blockquote>
    </p>
    <p>I'm not very familiar with bdb, but perhaps it could work. I've
      seen LMDB mentioned a few times and the license seems compatible
      so I'll research this a bit. My initial reasons for not going with
      SQL databases/other libraries:<br>
      - From my CDT experience with the C/C++ index model, tables don't
      seem like a good match<br>
      - I was under the impression that adding a third party library in
      llvm repositories was not straightforward or desired. But perhaps
      a small one is OK (i.e. not MySQL, PostgreSQL, etc).<br>
      - With several options available, I thought doing something
      similar to what I knew worked well in CDT would be a safe way to
      go. With a similar data storage strategy, the index model could be
      also inspired from CDT and give similar features.<br>
      - At the LLVM developer meeting in March, there seemed to be an
      understanding among Clangd contributors that a "tailored" format
      was the way to go<br>
    </p>
    <p>I'm happy to consider other existing solutions if they make more
      sense. It's always good not to have to maintain additional code.
      I'll look a bit more into LMDB in the next days to have a better
      picture. One thing that would be good is if we can have "pointer
      values" that can point to other entries in the database (in case
      someone knows already).<br>
    </p>
    <p><br>
      <blockquote type="cite">Clang shouldn't query by name when looking
        up references, instead USRs should be used. The USRs deal with
        scopes correctly (Hence the U in USR). </blockquote>
    </p>
    <p>Yes, querying by USR seem to work well so far!</p>
    <p><br>
    </p>
    <p>Regards,<br>
      Marc-André<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2017-08-11 12:16 PM, Alex L wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAKS3GBuB9Tb+vhGUagWH4zfVrjAfuGOBKt_xdQqSUK=KdQZQvw@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On 11 August 2017 at 17:09, Doug
            Schaefer via cfe-dev <span dir="ltr"><<a
                href="mailto:cfe-dev@lists.llvm.org" target="_blank"
                moz-do-not-send="true">cfe-dev@lists.llvm.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div link="blue" vlink="purple" lang="EN-US">
                <div class="m_7013316755050223983WordSection1">
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">Sorry
                      gang, Outlook isn’t treating this thread very
                      well. I’ll have to top post my response.</span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
                      CDT indexer and database were written in 2005 so
                      the data is long gone and we’re talking about
                      different database technologies anyway. All I
                      remember was it was slower by multiple orders of
                      magnitude on writes. And that stopped my
                      investigation right there. At the time I was
                      trying to get a full index of the Firefox source
                      down from an hour to a few minutes and incremental
                      indexing times down to less than a second (thanks
                      to the header caching). The DB was taking me in
                      the other direction.</span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">I’m
                      just not sure architecting for simple lookups is
                      the right approach. Just consider probably the
                      most common use case for clangd, content assist.
                      Given an offset into the file, determine the
                      context/scope of the content and a prefix. You
                      then need to find all matches for that prefix.
                      That’s by no means a simple lookup given all the
                      declarations, type hierarchies, scope hierarchies
                      in play in that context. Finding references is
                      another good one that also requires scope analysis
                      to make sure you’re actually returning references
                      to that exact definition and not just something
                      that happens to have the same name. The CDT index
                      has links back and forth between these things.</span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"></span></p>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Clang shouldn't query by name when looking up
              references, instead USRs should be used. The USRs deal
              with scopes correctly (Hence the U in USR). </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div link="blue" vlink="purple" lang="EN-US">
                <div class="m_7013316755050223983WordSection1">
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">Again,
                      I’m just relaying my experience haven built one of
                      these things before. And it was a long time ago.
                      All avenues are worth exploring.</span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif">Doug.</span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <p class="MsoNormal"><span
                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                  <div style="border:none;border-left:solid blue
                    1.5pt;padding:0in 0in 0in 4.0pt">
                    <div>
                      <div style="border:none;border-top:solid #e1e1e1
                        1.0pt;padding:3.0pt 0in 0in 0in">
                        <p class="MsoNormal"><b><span
                              style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
                            Manuel Klimek [mailto:<a
                              href="mailto:klimek@google.com"
                              target="_blank" moz-do-not-send="true">klimek@google.com</a>]
                            <br>
                            <b>Sent:</b> Friday, August 11, 2017 10:59
                            AM</span></p>
                        <div>
                          <div class="h5"><br>
                            <b>To:</b> Doug Schaefer <<a
                              href="mailto:dschaefer@blackberry.com"
                              target="_blank" moz-do-not-send="true">dschaefer@blackberry.com</a>>;
                            David Chisnall <<a
                              href="mailto:David.Chisnall@cl.cam.ac.uk"
                              target="_blank" moz-do-not-send="true">David.Chisnall@cl.cam.ac.uk</a>>;
                            Marc-André Laperle <<a
                              href="mailto:marc-andre.laperle@ericsson.com"
                              target="_blank" moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
                            <b>Cc:</b> via cfe-dev <<a
                              href="mailto:cfe-dev@lists.llvm.org"
                              target="_blank" moz-do-not-send="true">cfe-dev@lists.llvm.org</a>><br>
                            <b>Subject:</b> Re: [cfe-dev] Adding
                            indexing support to Clangd</div>
                        </div>
                      </div>
                    </div>
                    <div>
                      <div class="h5">
                        <p class="MsoNormal"> </p>
                        <div>
                          <div>
                            <div>
                              <p class="MsoNormal">On Fri, Aug 11, 2017
                                at 4:50 PM Doug Schaefer <<a
                                  href="mailto:dschaefer@blackberry.com"
                                  target="_blank" moz-do-not-send="true">dschaefer@blackberry.com</a>>
                                wrote:</p>
                            </div>
                            <blockquote
                              style="border:none;border-left:solid
                              #cccccc 1.0pt;padding:0in 0in 0in
                              6.0pt;margin-left:4.8pt;margin-right:0in">
                              <div>
                                <div>
                                  <p class="MsoNormal"><span
                                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                  <p class="MsoNormal"><span
                                      style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                  <div
                                    style="border:none;border-left:solid
                                    blue 1.5pt;padding:0in 0in 0in
                                    4.0pt">
                                    <div>
                                      <div
                                        style="border:none;border-top:solid
                                        #e1e1e1 1.0pt;padding:3.0pt 0in
                                        0in 0in">
                                        <p class="MsoNormal"><b><span
                                              style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
                                            Manuel Klimek [mailto:<a
                                              href="mailto:klimek@google.com"
                                              target="_blank"
                                              moz-do-not-send="true">klimek@google.com</a>]
                                            <br>
                                            <b>Sent:</b> Friday, August
                                            11, 2017 4:54 AM<br>
                                            <b>To:</b> Doug Schaefer
                                            <<a
                                              href="mailto:dschaefer@blackberry.com"
                                              target="_blank"
                                              moz-do-not-send="true">dschaefer@blackberry.com</a>>;
                                            David Chisnall <<a
                                              href="mailto:David.Chisnall@cl.cam.ac.uk"
                                              target="_blank"
                                              moz-do-not-send="true">David.Chisnall@cl.cam.ac.uk</a>>;
                                            Marc-André Laperle <<a
                                              href="mailto:marc-andre.laperle@ericsson.com"
                                              target="_blank"
                                              moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
                                            <b>Cc:</b> via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
                                              moz-do-not-send="true">cfe-dev@lists.llvm.org</a>></span></p>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                              <div>
                                <div>
                                  <div
                                    style="border:none;border-left:solid
                                    blue 1.5pt;padding:0in 0in 0in
                                    4.0pt">
                                    <div>
                                      <div
                                        style="border:none;border-top:solid
                                        #e1e1e1 1.0pt;padding:3.0pt 0in
                                        0in 0in">
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
                                            <b>Subject:</b> Re:
                                            [cfe-dev] Adding indexing
                                            support to Clangd</span></p>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                              <div>
                                <div>
                                  <div
                                    style="border:none;border-left:solid
                                    blue 1.5pt;padding:0in 0in 0in
                                    4.0pt">
                                    <div>
                                      <div>
                                        <div>
                                          <p class="MsoNormal">On Thu,
                                            Aug 10, 2017 at 5:57 PM Doug
                                            Schaefer via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
                                              moz-do-not-send="true">cfe-dev@lists.llvm.org</a>>
                                            wrote:</p>
                                        </div>
                                        <blockquote
                                          style="border:none;border-left:solid
                                          #cccccc 1.0pt;padding:0in 0in
                                          0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                                          <p class="MsoNormal">>
                                            -----Original Message-----<br>
                                            > From: Dr D. Chisnall
                                            [mailto:<a
                                              href="mailto:dc552@hermes.cam.ac.uk"
                                              target="_blank"
                                              moz-do-not-send="true">dc552@hermes.cam.ac.uk</a><wbr>]
                                            On Behalf Of David<br>
                                            > Chisnall<br>
                                            > Sent: Thursday, August
                                            10, 2017 6:10 AM<br>
                                            > To: Marc-André Laperle
                                            <<a
                                              href="mailto:marc-andre.laperle@ericsson.com"
                                              target="_blank"
                                              moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
                                            > Cc: via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
                                              moz-do-not-send="true">cfe-dev@lists.llvm.org</a>>;
                                            <a
                                              href="mailto:zeratul976@hotmail.com"
                                              target="_blank"
                                              moz-do-not-send="true">zeratul976@hotmail.com</a>;
                                            Doug<br>
                                            > Schaefer <<a
                                              href="mailto:dschaefer@blackberry.com"
                                              target="_blank"
                                              moz-do-not-send="true">dschaefer@blackberry.com</a>><br>
                                            > Subject: Re: [cfe-dev]
                                            Adding indexing support to
                                            Clangd<br>
                                            ><br>
                                            > On 8 Aug 2017, at
                                            18:52, Marc-André Laperle
                                            via cfe-dev <cfe-<br>
                                            > <a
                                              href="mailto:dev@lists.llvm.org"
                                              target="_blank"
                                              moz-do-not-send="true">dev@lists.llvm.org</a>>
                                            wrote:<br>
                                            > ><br>
                                            > >
                                            --ClangdIndexStorage--<br>
                                            > > malloc-like
                                            interface that
                                            allocates/frees data blocks
                                            of variable sizes on<br>
                                            > disk. The blocks can
                                            contain ints, shorts,
                                            strings, pointers (i.e. file
                                            offsets),<br>
                                            > etc. The data is cached
                                            in 4K pieces so that local
                                            and repeated accesses are
                                            all<br>
                                            > done quickly, in
                                            memory.<br>
                                            > > Clangd mallocs and
                                            writes its index model
                                            objects using this.<br>
                                            > ><br>
                                            > > --BTree--<br>
                                            > > A pretty classic
                                            BTree implementation. Used
                                            to speed up lookup (symbol<br>
                                            > names, file names). It
                                            allocates its nodes using
                                            ClangdIndexStorage therefore<br>
                                            > it is stored on disk.
                                            Keys are actually records in
                                            ClangdIndexStorage so you<br>
                                            > can really think of the
                                            BTree as a collection of
                                            sorted pointers (sorted<br>
                                            > according to a provided
                                            comparator).<br>
                                            ><br>
                                            > This sounds very like
                                            bdb.  Is there a reason that
                                            we’re reimplementing a<br>
                                            > large chunk of bdb,
                                            rather than just using it
                                            (or using something like
                                            sqlite for<br>
                                            > the index storage)?<br>
                                            <br>
                                            It looks like Marc-Andre is
                                            following our work on the
                                            Eclipse CDT indexer. We
                                            found databases to be just
                                            too slow.</p>
                                        </blockquote>
                                        <div>
                                          <p class="MsoNormal"> </p>
                                        </div>
                                        <div>
                                          <p class="MsoNormal">That is
                                            somewhat surprising. Which
                                            db's did you benchmark
                                            against?</p>
                                          <p class="MsoNormal"><span
                                              style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                        </div>
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                              <div>
                                <div>
                                  <div
                                    style="border:none;border-left:solid
                                    blue 1.5pt;padding:0in 0in 0in
                                    4.0pt">
                                    <div>
                                      <div>
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif">This
                                            was many years ago and we
                                            were trying Apache Derby
                                            (since Eclipse is written in
                                            Java). And at the time the
                                            biggest issue was write
                                            performance totally
                                            destroying any gains we made
                                            with header caching. But
                                            that may have just been
                                            Derby.</span></p>
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif">Though
                                            I think the bigger issue is
                                            that the language model is a
                                            graph which doesn’t fit
                                            naturally in tables. You’ll
                                            end up spending more time
                                            fighting that than building
                                            out your own data
                                            structures. But that’s just
                                            my experience.</span></p>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                            <div>
                              <p class="MsoNormal"> </p>
                            </div>
                            <div>
                              <p class="MsoNormal">Yea, the language
                                model not fitting into tables well does
                                match my intuition, but for simple
                                queries, I'd have expected a db to be
                                fast enough with room to spare. Would be
                                curious about more details / numbers.</p>
                            </div>
                            <div>
                              <p class="MsoNormal"> </p>
                            </div>
                            <div>
                              <p class="MsoNormal"> </p>
                            </div>
                            <blockquote
                              style="border:none;border-left:solid
                              #cccccc 1.0pt;padding:0in 0in 0in
                              6.0pt;margin-left:4.8pt;margin-right:0in">
                              <div>
                                <div>
                                  <div
                                    style="border:none;border-left:solid
                                    blue 1.5pt;padding:0in 0in 0in
                                    4.0pt">
                                    <div>
                                      <div>
                                        <p class="MsoNormal"><span
                                            style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
                                        <div>
                                          <p class="MsoNormal"> </p>
                                        </div>
                                        <blockquote
                                          style="border:none;border-left:solid
                                          #cccccc 1.0pt;padding:0in 0in
                                          0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                                          <p class="MsoNormal">Also,
                                            forcing the C++ language
                                            model to fit into a bunch of
                                            tables or key/value pairs
                                            just didn't make that much
                                            sense in the end. The data
                                            is very complex.<br>
                                            <br>
                                            Instead it's better to think
                                            of this as a big graph data
                                            structure that just happens
                                            to be backed by memory
                                            mapped file store. Clients
                                            generally deal directly with
                                            pointers into that data
                                            structure and follow
                                            pointers in the graph. The
                                            BTrees help speed up that
                                            navigation at various nodes
                                            in the graph. It's super
                                            fast.<br>
                                            <br>
                                            Doug.<br>
______________________________<wbr>_________________<br>
                                            cfe-dev mailing list<br>
                                            <a
                                              href="mailto:cfe-dev@lists.llvm.org"
                                              target="_blank"
                                              moz-do-not-send="true">cfe-dev@lists.llvm.org</a><br>
                                            <a
                                              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
                                              target="_blank"
                                              moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a></p>
                                        </blockquote>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
              <br>
              ______________________________<wbr>_________________<br>
              cfe-dev mailing list<br>
              <a href="mailto:cfe-dev@lists.llvm.org"
                moz-do-not-send="true">cfe-dev@lists.llvm.org</a><br>
              <a
                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>