<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>
<blockquote type="cite">
<pre wrap="">This sounds very like bdb. Is there a reason that we’re reimplementing a large chunk of bdb, rather than just using it (or using something like sqlite for the index storage)?</pre>
</blockquote>
</p>
<p>I'm not very familiar with bdb, but perhaps it could work. I've
seen LMDB mentioned a few times and the license seems compatible
so I'll research this a bit. My initial reasons for not going with
SQL databases/other libraries:<br>
- From my CDT experience with the C/C++ index model, tables don't
seem like a good match<br>
- I was under the impression that adding a third party library in
llvm repositories was not straightforward or desired. But perhaps
a small one is OK (i.e. not MySQL, PostgreSQL, etc).<br>
- With several options available, I thought doing something
similar to what I knew worked well in CDT would be a safe way to
go. With a similar data storage strategy, the index model could be
also inspired from CDT and give similar features.<br>
- At the LLVM developer meeting in March, there seemed to be an
understanding among Clangd contributors that a "tailored" format
was the way to go<br>
</p>
<p>I'm happy to consider other existing solutions if they make more
sense. It's always good not to have to maintain additional code.
I'll look a bit more into LMDB in the next days to have a better
picture. One thing that would be good is if we can have "pointer
values" that can point to other entries in the database (in case
someone knows already).<br>
</p>
<p><br>
<blockquote type="cite">Clang shouldn't query by name when looking
up references, instead USRs should be used. The USRs deal with
scopes correctly (Hence the U in USR). </blockquote>
</p>
<p>Yes, querying by USR seem to work well so far!</p>
<p><br>
</p>
<p>Regards,<br>
Marc-André<br>
</p>
<br>
<div class="moz-cite-prefix">On 2017-08-11 12:16 PM, Alex L wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAKS3GBuB9Tb+vhGUagWH4zfVrjAfuGOBKt_xdQqSUK=KdQZQvw@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 11 August 2017 at 17:09, Doug
Schaefer via cfe-dev <span dir="ltr"><<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div class="m_7013316755050223983WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Sorry
gang, Outlook isn’t treating this thread very
well. I’ll have to top post my response.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
CDT indexer and database were written in 2005 so
the data is long gone and we’re talking about
different database technologies anyway. All I
remember was it was slower by multiple orders of
magnitude on writes. And that stopped my
investigation right there. At the time I was
trying to get a full index of the Firefox source
down from an hour to a few minutes and incremental
indexing times down to less than a second (thanks
to the header caching). The DB was taking me in
the other direction.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">I’m
just not sure architecting for simple lookups is
the right approach. Just consider probably the
most common use case for clangd, content assist.
Given an offset into the file, determine the
context/scope of the content and a prefix. You
then need to find all matches for that prefix.
That’s by no means a simple lookup given all the
declarations, type hierarchies, scope hierarchies
in play in that context. Finding references is
another good one that also requires scope analysis
to make sure you’re actually returning references
to that exact definition and not just something
that happens to have the same name. The CDT index
has links back and forth between these things.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"></span></p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Clang shouldn't query by name when looking up
references, instead USRs should be used. The USRs deal
with scopes correctly (Hence the U in USR). </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div class="m_7013316755050223983WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Again,
I’m just relaying my experience haven built one of
these things before. And it was a long time ago.
All avenues are worth exploring.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Doug.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<div style="border:none;border-left:solid blue
1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #e1e1e1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Manuel Klimek [mailto:<a
href="mailto:klimek@google.com"
target="_blank" moz-do-not-send="true">klimek@google.com</a>]
<br>
<b>Sent:</b> Friday, August 11, 2017 10:59
AM</span></p>
<div>
<div class="h5"><br>
<b>To:</b> Doug Schaefer <<a
href="mailto:dschaefer@blackberry.com"
target="_blank" moz-do-not-send="true">dschaefer@blackberry.com</a>>;
David Chisnall <<a
href="mailto:David.Chisnall@cl.cam.ac.uk"
target="_blank" moz-do-not-send="true">David.Chisnall@cl.cam.ac.uk</a>>;
Marc-André Laperle <<a
href="mailto:marc-andre.laperle@ericsson.com"
target="_blank" moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
<b>Cc:</b> via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">cfe-dev@lists.llvm.org</a>><br>
<b>Subject:</b> Re: [cfe-dev] Adding
indexing support to Clangd</div>
</div>
</div>
</div>
<div>
<div class="h5">
<p class="MsoNormal"> </p>
<div>
<div>
<div>
<p class="MsoNormal">On Fri, Aug 11, 2017
at 4:50 PM Doug Schaefer <<a
href="mailto:dschaefer@blackberry.com"
target="_blank" moz-do-not-send="true">dschaefer@blackberry.com</a>>
wrote:</p>
</div>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<div
style="border:none;border-left:solid
blue 1.5pt;padding:0in 0in 0in
4.0pt">
<div>
<div
style="border:none;border-top:solid
#e1e1e1 1.0pt;padding:3.0pt 0in
0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Manuel Klimek [mailto:<a
href="mailto:klimek@google.com"
target="_blank"
moz-do-not-send="true">klimek@google.com</a>]
<br>
<b>Sent:</b> Friday, August
11, 2017 4:54 AM<br>
<b>To:</b> Doug Schaefer
<<a
href="mailto:dschaefer@blackberry.com"
target="_blank"
moz-do-not-send="true">dschaefer@blackberry.com</a>>;
David Chisnall <<a
href="mailto:David.Chisnall@cl.cam.ac.uk"
target="_blank"
moz-do-not-send="true">David.Chisnall@cl.cam.ac.uk</a>>;
Marc-André Laperle <<a
href="mailto:marc-andre.laperle@ericsson.com"
target="_blank"
moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
<b>Cc:</b> via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a>></span></p>
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div
style="border:none;border-left:solid
blue 1.5pt;padding:0in 0in 0in
4.0pt">
<div>
<div
style="border:none;border-top:solid
#e1e1e1 1.0pt;padding:3.0pt 0in
0in 0in">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re:
[cfe-dev] Adding indexing
support to Clangd</span></p>
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div
style="border:none;border-left:solid
blue 1.5pt;padding:0in 0in 0in
4.0pt">
<div>
<div>
<div>
<p class="MsoNormal">On Thu,
Aug 10, 2017 at 5:57 PM Doug
Schaefer via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a>>
wrote:</p>
</div>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0in 0in
0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<p class="MsoNormal">>
-----Original Message-----<br>
> From: Dr D. Chisnall
[mailto:<a
href="mailto:dc552@hermes.cam.ac.uk"
target="_blank"
moz-do-not-send="true">dc552@hermes.cam.ac.uk</a><wbr>]
On Behalf Of David<br>
> Chisnall<br>
> Sent: Thursday, August
10, 2017 6:10 AM<br>
> To: Marc-André Laperle
<<a
href="mailto:marc-andre.laperle@ericsson.com"
target="_blank"
moz-do-not-send="true">marc-andre.laperle@ericsson.<wbr>com</a>><br>
> Cc: via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a>>;
<a
href="mailto:zeratul976@hotmail.com"
target="_blank"
moz-do-not-send="true">zeratul976@hotmail.com</a>;
Doug<br>
> Schaefer <<a
href="mailto:dschaefer@blackberry.com"
target="_blank"
moz-do-not-send="true">dschaefer@blackberry.com</a>><br>
> Subject: Re: [cfe-dev]
Adding indexing support to
Clangd<br>
><br>
> On 8 Aug 2017, at
18:52, Marc-André Laperle
via cfe-dev <cfe-<br>
> <a
href="mailto:dev@lists.llvm.org"
target="_blank"
moz-do-not-send="true">dev@lists.llvm.org</a>>
wrote:<br>
> ><br>
> >
--ClangdIndexStorage--<br>
> > malloc-like
interface that
allocates/frees data blocks
of variable sizes on<br>
> disk. The blocks can
contain ints, shorts,
strings, pointers (i.e. file
offsets),<br>
> etc. The data is cached
in 4K pieces so that local
and repeated accesses are
all<br>
> done quickly, in
memory.<br>
> > Clangd mallocs and
writes its index model
objects using this.<br>
> ><br>
> > --BTree--<br>
> > A pretty classic
BTree implementation. Used
to speed up lookup (symbol<br>
> names, file names). It
allocates its nodes using
ClangdIndexStorage therefore<br>
> it is stored on disk.
Keys are actually records in
ClangdIndexStorage so you<br>
> can really think of the
BTree as a collection of
sorted pointers (sorted<br>
> according to a provided
comparator).<br>
><br>
> This sounds very like
bdb. Is there a reason that
we’re reimplementing a<br>
> large chunk of bdb,
rather than just using it
(or using something like
sqlite for<br>
> the index storage)?<br>
<br>
It looks like Marc-Andre is
following our work on the
Eclipse CDT indexer. We
found databases to be just
too slow.</p>
</blockquote>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">That is
somewhat surprising. Which
db's did you benchmark
against?</p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
</div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
</div>
</div>
</div>
</div>
</div>
<div>
<div>
<div
style="border:none;border-left:solid
blue 1.5pt;padding:0in 0in 0in
4.0pt">
<div>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">This
was many years ago and we
were trying Apache Derby
(since Eclipse is written in
Java). And at the time the
biggest issue was write
performance totally
destroying any gains we made
with header caching. But
that may have just been
Derby.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Though
I think the bigger issue is
that the language model is a
graph which doesn’t fit
naturally in tables. You’ll
end up spending more time
fighting that than building
out your own data
structures. But that’s just
my experience.</span></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Yea, the language
model not fitting into tables well does
match my intuition, but for simple
queries, I'd have expected a db to be
fast enough with room to spare. Would be
curious about more details / numbers.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<div
style="border:none;border-left:solid
blue 1.5pt;padding:0in 0in 0in
4.0pt">
<div>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span></p>
<div>
<p class="MsoNormal"> </p>
</div>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0in 0in
0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<p class="MsoNormal">Also,
forcing the C++ language
model to fit into a bunch of
tables or key/value pairs
just didn't make that much
sense in the end. The data
is very complex.<br>
<br>
Instead it's better to think
of this as a big graph data
structure that just happens
to be backed by memory
mapped file store. Clients
generally deal directly with
pointers into that data
structure and follow
pointers in the graph. The
BTrees help speed up that
navigation at various nodes
in the graph. It's super
fast.<br>
<br>
Doug.<br>
______________________________<wbr>_________________<br>
cfe-dev mailing list<br>
<a
href="mailto:cfe-dev@lists.llvm.org"
target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
target="_blank"
moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a></p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>