<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Hi,</p>
<p><br>
</p>
<p>This was asked before, but what would be the process in getting liblmdb in clang-tools-extra? I've started prototyping with it and it is quite useful and small. I had a small library (ClangdIndexDataStorage + BTree) filling the same role before and I *think*
I'll be able to fully replace it with <span>liblmdb.</span></p>
<p><br>
</p>
<p>One concern I had with the library at first is that because it uses memory mapping, I was not clear to me how we could control its memory usage. But I had in mind a single DB that included *everything*, i.e. all symbols, occurrences. After reading the index-while-building
proposal, I like the idea of producing record and units and have a mapping referring to to those.</p>
<p><br>
</p>
<p>There is a part of the proposal that I want to make sure I understood: "<span>Background indexing still occurs with this setup, but instead of being based on a call to libclang, is achieved by invoking Clang with both the -index-store-path option and -fsyntax-only</span>".
I assuming this background indexing by invoking 'clang -<span>index-store-path -fsyntax-only</span>' is mainly for a scenario were a unit has not been built yet?</p>
<p><br>
</p>
<p>What are the next steps in upstreaming this "<span>index-while-building</span>" support? I think it makes perfect sense for Clangd to use this support and use a similar indexing strategy. I think there's a nice opportunity for collaboration.<br>
</p>
<p><br>
</p>
<p>Marc-André Laperle<br>
</p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Argyrios Kyrtzidis <kyrtzidis@apple.com><br>
<b>Sent:</b> Thursday, August 31, 2017 8:56:01 PM<br>
<b>To:</b> Ilya Biryukov<br>
<b>Cc:</b> Manuel Klimek; Benjamin Kramer; Krasimir Georgiev; Marc-André Laperle; Nathan Hawes; via cfe-dev<br>
<b>Subject:</b> Re: [cfe-dev] RFC: Adding index-while-building support to Clang</font>
<div> </div>
</div>
<div><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Aug 31, 2017, at 1:26 AM, Ilya Biryukov <<a href="mailto:ibiryukov@google.com" class="">ibiryukov@google.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Hi Argyrios,
<div class=""><br class="">
<div class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word" class="">
<div class="">In our implementation we use LMDB (<a href="https://symas.com/lightning-memory-mapped-database" target="_blank" class="">https://symas.com/lightning-<wbr class="">memory-mapped-database</a>). It is a key-value data-store that we use for cross-referencing
queries, similarly to the example that Nathan provides in the document.<br class="">
</div>
<div class="">Is this something that we could accept into the clang project (e.g. in clang-tools-extra) ? Note it is essentially a single header and implementation file.<br class="">
</div>
</div>
</blockquote>
<div class="">AFAIK, LLVM's policy on dependencies is pretty tight. Is it hard to isolate the DB layer or it tightly coupled to the implementation?</div>
<div class="">If it's possible, we could include have DB-agnostic API in cfe or clang-tools-extra and an alternative implementation of the storage layer.</div>
<div class="">+klimek, +bkramer, maybe you could comment on adding the new third-party dependencies to LLVM? Is it possible?</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
The license is BSD-like (see <a href="https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/LICENSE" class="">https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/LICENSE</a>), which I think makes it compatible. And it would only be a new
dependency added in clang-tools-extra.</div>
<div class=""><br class="">
</div>
<div class="">I think it would be beneficial to focus on one implementation (at least at the beginning).</div>
<div class="">- Assuming that it starts with an in-memory implementation of key-value store, at some point it will be natural to want to add persistence, and at that point you end-up implementing what lmdb already provides.</div>
<div class="">- Having one implementation in-tree and another out-of-tree, is not ideal; some usage patterns may be fine for one but problematic for the other. We may evolve multiple implementations later on, if the need arises, but ideally they would be in-tree.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word" class=""><span class="gmail-">
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="">2. In clangd, we're not controlling the build step, instead building ASTs in-memory. We would rather store the indexing information in-memory or consume it on the go while building ASTs.<br class="">
</div>
<div class="">Do you have suggestions on which parts of the API we should look at?</div>
<div class="">We could implement our own IndexASTConsumer, but are there more opportunities for reusing other parts of your implementation? Code for collecting indexing dependencies, definitions of high-level record structures (i.e. symbol definitions, etc.)?</div>
</div>
</blockquote>
</span>
<div class="">There are a few ways to go about this:</div>
<div class="">- Have ASTs in-memory, but indexing works on the file system. It’s not ideal but it is simple and works fairly well in practice, particularly since in our platform, files open in Xcode can be saved in disk even without having the user explicitly
saving them.<br class="">
</div>
</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word" class="">
<div class="">
<div class="">- Update clang’s raw index data store using the in-memory buffers and ASTs. The simplicity is that symbol info comes from one place only, but there’s complexity in that you have raw data on disk that reflect in-memory-only sources.</div>
</div>
</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word" class="">
<div class="">
<div class="">- The layer on-top of clang's raw index data store is enhanced to treat the raw data on-disk as one source of symbol info, and in-memory ASTs as another. For example, if using LMDB, you could have it distinguish that info about a symbol comes
from the raw data on-disk vs an in-memory AST.</div>
</div>
</div>
</blockquote>
<div class="">Thanks. We probably want some combination of all options. We would definitely benefit from reading the on-disk indexes. if they are there. But those may be outdated, so we could our own indexing have a layer on top of that for the modified files.
Than we could dispatch all requests to both layers and combine the results. Wonder if it's possible to make it work and how much effort is it.</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
<div class="">FYI, for updating out-of-date files (without needing to build), we have the ‘background indexing’ mechanism, which invokes “clang -fsyntax-only -index-store-path …” for the main files that are out-of-date (or include header files that are out-of-date),
and brings the index-store up-to-date.</div>
<div class="">This does have the complexity of maintaining a “mini-build-system-like” mechanism, and the associated scheduling logic that comes with it.</div>
</div>
</body>
</html>