[clangd-dev] Building and sharing a clangd global index

Fri Apr 5 02:37:15 PDT 2019

A few quick clarifications, Sam would probably have more to add.

> Is the background-indexer smart enough to do a rescan of the code base,
and only update the files that changed? My assumption is yes, because the
paths are the same and the digests(?) will be the same for the unchanged
files, but confirmation here would be great.

Yes, that's correct. It stores the digests of the files and tries to update
only a minimal subset of the codebase that actually changed.
Of course, there might be rough edges and bugs, please let us know if you
encounter them.

> Ah okay, so if I wanted to use the background-index and dex, are the only
arguments I have to pass (in clang-8) `-background-index` and
`-use-dex-index`?
Correct. I thought we made -use-dex the default for clangd-8, but
double-checked now and it's actually *not* the case.

> Do people run clangd servers on their local machines and only defer to
the RPC server for LSP queries that have to consult the static index?
Yes, clangd is run locally on each machine and it queries the Index service
via RPC to avoid keeping the whole index locally. Note that the RPC server
does not serve LSP requests, instead it serves the clangd-specific
operations (as previously mentioned, the full list of functions we require
from the index is in `clangd::SymbolIndex`).

> If so, then how does the static index get updated if multiple people have
different versions of the code?
We assume everyone is on the mainline branch (we have a monorepo
internally). The index is rebuild ~ once a day, hence the results from the
RPC server are sometimes outdated.
To compensate for the staleness in the common case (local modifications, a
slightly outdated index, etc.), we have overlay for the files open in the
editor (and the files included by them) in the form of "dynamic index"
internally.

On Fri, Apr 5, 2019 at 5:18 AM William Wagner via clangd-dev <
clangd-dev at lists.llvm.org> wrote:

> Hey Sam,
>
> I do like the idea of a project relative URI scheme. You mentioned the
> tricky part was the path -> URI conversion, if i understand correctly, part
> of why it's tricky is say you had:
>     - URI: project://foo as your "root"
>     - Path: /home/foo/foo/foo.cc
> It'd be hard to know whether the URI for this path would be
> project://foo/foo/foo.cc or project://foo/foo.cc. I suppose you could
> recurse upwards until you hit some kind of boundary (e.g. a git folder?)
>
> > Obviously this has the weakness that indexes only transfer between
> projects where the root has > the same name, not sure how big a problem
> this would be in practice.
> At least for me and most of the projects I see at work, I don't think this
> would be a show stopper.
>
> > ... and also run the index as an RPC server and use a custom
> implementation of SymbolIndex
> > that queries it.
> Trying to wrap my head around this, as i'm very intrigued. Do people run
> clangd servers on their local machines and only defer to the RPC server for
> LSP queries that have to consult the static index? Also, is an index shared
> my multiple people? If so, then how does the static index get updated if
> multiple people have different versions of the code?
>
> Thanks,
> William
>
> On Wed, Apr 3, 2019 at 8:37 AM Sam McCall via clangd-dev <
> clangd-dev at lists.llvm.org> wrote:
>
>> What you want to do is possible (we do something very similar), though
>> isn't quite working out-of-the-box yet.
>> There's two main parts:
>>  - *Building and distributing an index* is pretty easy: run
>> clangd-indexer and copy the file to each machine.[1]
>>  - *Translating filenames in the index to match those on the machine* is
>> what the URIs Eric mentioned are for, and isn't polished.
>>  The idea is clangd-indexer will see a file in /path/a/project/Foo.cc,
>> and clangd (on another machine) will see it in a different
>> /path/b/project/Foo.cc.
>>    So it's the indexer's job to translate the path into a
>> machine-agnostic URI like myproject:///Foo.cc, and then clangd's job is to
>> work out which concrete file that refers to in the current context. The
>> clangd::URIScheme implementations handle this at both ends.
>>    However open-source clangd only has the file scheme today, people need
>> to patch it to handle these cases[2].
>>
>> -- design speculation follows --
>> I think we should ship a generic "project-relative" URI scheme with
>> clangd so this can work.
>>
>> One idea I have is a scheme like project://somebasedir/path/file.cc
>> Here the assumption is that the project is rooted under a directory with
>> a fixed name "somebasedir" recorded in the URI authority.
>>  - URI -> path is easy: find the concrete somebasedir based on the
>> currently edited file, and concatenate.
>>  - path -> URI is tricky: we need to determine which (if any) parent
>> directory is the relevant base.
>>     - A flag makes sense for clangd-indexer, but clangd also needs to do
>> this conversion sometimes and a flag is a burden there.
>>     - Maybe we can get away with just keeping track of the authorities
>> we've seen the external index return? But this doesn't really help for
>> background index, and mixed internal/external index cases could get messy.
>>     - looking for compilation databases is tempting too, but complicated
>> (requires IO in the URI scheme, and we have ways to use clangd with an
>> external CDB, and the CDB interfaces aren't quite right for this today)
>> So I don't see a way to do this that's super-clean (cheap, zero-config,
>> correct) but interested in ideas others have.
>>
>> Obviously this has the weakness that indexes only transfer between
>> projects where the root has the same name, not sure how big a problem this
>> would be in practice.
>>
>> [1] There are certainly fancier variations: for google's index we
>> distribute the index building by running Index/IndexAction in a mapreduce,
>> and also run the index as an RPC server and use a custom implementation of
>> SymbolIndex that queries it. The latter means our developers have to use a
>> patched clangd. Building the index file and copying it is a good place to
>> start, you'll see where the scaling limits are.
>>
>> [2] Ours is pretty simple, as the project is always rooted at a directory
>> with a fixed name.
>>
>>
>> On Wed, Apr 3, 2019 at 10:38 AM Eric Liu via clangd-dev <
>> clangd-dev at lists.llvm.org> wrote:
>>
>>> Just to add on what Ilya said.
>>>
>>> > Note that both indexes store absolute paths, so sharing the produced
>>> index across multiple machines would only be possible if the directory
>>> structure is kept the same.
>>> > If having the same directory structure is plausible, please try it out
>>> and let us know if it works, we haven't tried sharing the same index across
>>> multiple machines.
>>> Paths are stored as URI in the index. By default, "file" scheme is used,
>>> so URI would simply be absolute path (e.g. file:///user/home/llvm/x/y.h).
>>> But you could also define your own URI schemes. For example, you can choose
>>> to store relative paths in the URI (e.g. llvm:///x/y.h) in a custom scheme,
>>> and they can be resolved with potentially different project roots on users'
>>> machines to get correct full paths. For more information, please take a
>>> look at clangd/URI.h library. You could also find some sample URIScheme
>>> implementations in unit tests.
>>>
>>> Cheers,
>>> Eric
>>>
>>>
>>> On Wed, Apr 3, 2019 at 10:27 AM Ilya Biryukov via clangd-dev <
>>> clangd-dev at lists.llvm.org> wrote:
>>>
>>>> Hi William,
>>>>
>>>> The difference between background-indexer and clangd-indexer is the
>>>> layout of the output:
>>>> - background-indexer would put the resulting index into the folder
>>>> <project-root>/.clangd/index.
>>>>   The index is split per-file, i.e. it's incremental and clangd would
>>>> be able to update the files that changed after the index was built.
>>>>   You would need to run clangd with '-background-index' to load the
>>>> index, it will also automatically update the index for files that changed
>>>> on load.
>>>> - clangd-indexer would produce a *merged *index, it can't be
>>>> incrementally updated and you have more control for the location of the
>>>> output:
>>>>   ./bin/clangd-indexer -executor=all-TUs path/to/compile_commands.json
>>>> > path/to/output.riff
>>>>   You would need to run clangd with '-index-file=path/to/output.riff'
>>>> to load the index.
>>>>
>>>> Note that both indexes store absolute paths, so sharing the produced
>>>> index across multiple machines would only be possible if the directory
>>>> structure is kept the same.
>>>> If having the same directory structure is plausible, please try it out
>>>> and let us know if it works, we haven't tried sharing the same index across
>>>> multiple machines.
>>>>
>>>> Which option to prefer? Depending on your situation, either of the two
>>>> might be better:
>>>> - If you always want an up-to-date index and storing the shared
>>>> snapshot is just a performance optimization, use background-indexer.
>>>> - If you not wasting resources to rebuild the index for changed files
>>>> is more important than the fact that some results are stale (e.g. it's too
>>>> expensive, you want to save laptop battery, etc.), clangd-indexer might be
>>>> a better choice.
>>>>
>>>> Here's a short summary on what each index means:
>>>> - Static index is an index that is persisted across multiple runs of
>>>> clangd. There are two flavours of it:
>>>>   1. Background index. Incremental (split per-file) index living in
>>>> '<project-root>/.clangd/index'.  Built automatically by clangd when
>>>> -background-index is specified. Long-term, we want this to be enabled by
>>>> default (and possibly be the only option).
>>>>   2. Old-style "merged" index produced by clangd-indexer. The results
>>>> will not get updated by clangd automatically, you can ask clangd to load it
>>>> with '-index-file=path/to/index.riff'.
>>>> - Dynamic index is an overlay for a small number of updated files
>>>> (currently the open files for which we built the AST). Kept in memory, not
>>>> persisted across multiple runs. We use to adjust for the fact that static
>>>> index might be stale. We want the correct results for the open files in all
>>>> cases.
>>>> - Dex is an efficient implementation of running search queries (e.g. it
>>>> models fuzzy-matching algorithm, etc.). It's an "index" in an information
>>>> retrieval sense, it is not actually specific to C++ or clangd.
>>>>
>>>> On Mon, Apr 1, 2019 at 6:36 PM William Wagner (BLOOMBERG/ 731 LEX) via
>>>> clangd-dev <clangd-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I work on a fairly large C++ project and wanted to figure out a way to
>>>>> regularly build (e.g. nightly via Jenkins) a global project index that can
>>>>> be shared with all the members of my team. I want to share it because it
>>>>> takes a fairly long time to build the index after starting up, and it seems
>>>>> pretty redundant to have each team member doing so, seeing as most of the
>>>>> code is not changing on a day-to-day basis. I’ve tried peeking around the
>>>>> mailing lists and commit history of clangd, but I’m not sure whether this
>>>>> is possible yet - and if it was, what flags to use, what indexer etc.
>>>>>
>>>>> I see there’s background-indexer WIP (https://reviews.llvm.org/D59605)
>>>>> and an existing clangd-indexer
>>>>> https://github.com/llvm-mirror/clang-tools-extra/blob/master/clangd/indexer/IndexerMain.cpp
>>>>> What is the difference between these?
>>>>>
>>>>> Additionally, if anyone could provide some clarification on the
>>>>> different types of indexes clangd currently has (dex, background, static,
>>>>> etc.) that would be great :)
>>>>>
>>>>> Thanks!
>>>>>
>>>>> _______________________________________________
>>>>> clangd-dev mailing list
>>>>> clangd-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ilya Biryukov
>>>> _______________________________________________
>>>> clangd-dev mailing list
>>>> clangd-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>
>>> _______________________________________________
>>> clangd-dev mailing list
>>> clangd-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>
>> _______________________________________________
>> clangd-dev mailing list
>> clangd-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>
> _______________________________________________
> clangd-dev mailing list
> clangd-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>

-- 
Regards,
Ilya Biryukov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20190405/eda3d2db/attachment-0001.html>