[clangd-dev] Building and sharing a clangd global index

William Wagner via clangd-dev clangd-dev at lists.llvm.org
Thu Apr 4 20:17:57 PDT 2019


Hey Sam,

I do like the idea of a project relative URI scheme. You mentioned the
tricky part was the path -> URI conversion, if i understand correctly, part
of why it's tricky is say you had:
    - URI: project://foo as your "root"
    - Path: /home/foo/foo/foo.cc
It'd be hard to know whether the URI for this path would be
project://foo/foo/foo.cc or project://foo/foo.cc. I suppose you could
recurse upwards until you hit some kind of boundary (e.g. a git folder?)

> Obviously this has the weakness that indexes only transfer between
projects where the root has > the same name, not sure how big a problem
this would be in practice.
At least for me and most of the projects I see at work, I don't think this
would be a show stopper.

> ... and also run the index as an RPC server and use a custom
implementation of SymbolIndex
> that queries it.
Trying to wrap my head around this, as i'm very intrigued. Do people run
clangd servers on their local machines and only defer to the RPC server for
LSP queries that have to consult the static index? Also, is an index shared
my multiple people? If so, then how does the static index get updated if
multiple people have different versions of the code?

Thanks,
William

On Wed, Apr 3, 2019 at 8:37 AM Sam McCall via clangd-dev <
clangd-dev at lists.llvm.org> wrote:

> What you want to do is possible (we do something very similar), though
> isn't quite working out-of-the-box yet.
> There's two main parts:
>  - *Building and distributing an index* is pretty easy: run
> clangd-indexer and copy the file to each machine.[1]
>  - *Translating filenames in the index to match those on the machine* is
> what the URIs Eric mentioned are for, and isn't polished.
>  The idea is clangd-indexer will see a file in /path/a/project/Foo.cc, and
> clangd (on another machine) will see it in a different
> /path/b/project/Foo.cc.
>    So it's the indexer's job to translate the path into a machine-agnostic
> URI like myproject:///Foo.cc, and then clangd's job is to work out which
> concrete file that refers to in the current context. The clangd::URIScheme
> implementations handle this at both ends.
>    However open-source clangd only has the file scheme today, people need
> to patch it to handle these cases[2].
>
> -- design speculation follows --
> I think we should ship a generic "project-relative" URI scheme with clangd
> so this can work.
>
> One idea I have is a scheme like project://somebasedir/path/file.cc
> Here the assumption is that the project is rooted under a directory with a
> fixed name "somebasedir" recorded in the URI authority.
>  - URI -> path is easy: find the concrete somebasedir based on the
> currently edited file, and concatenate.
>  - path -> URI is tricky: we need to determine which (if any) parent
> directory is the relevant base.
>     - A flag makes sense for clangd-indexer, but clangd also needs to do
> this conversion sometimes and a flag is a burden there.
>     - Maybe we can get away with just keeping track of the authorities
> we've seen the external index return? But this doesn't really help for
> background index, and mixed internal/external index cases could get messy.
>     - looking for compilation databases is tempting too, but complicated
> (requires IO in the URI scheme, and we have ways to use clangd with an
> external CDB, and the CDB interfaces aren't quite right for this today)
> So I don't see a way to do this that's super-clean (cheap, zero-config,
> correct) but interested in ideas others have.
>
> Obviously this has the weakness that indexes only transfer between
> projects where the root has the same name, not sure how big a problem this
> would be in practice.
>
> [1] There are certainly fancier variations: for google's index we
> distribute the index building by running Index/IndexAction in a mapreduce,
> and also run the index as an RPC server and use a custom implementation of
> SymbolIndex that queries it. The latter means our developers have to use a
> patched clangd. Building the index file and copying it is a good place to
> start, you'll see where the scaling limits are.
>
> [2] Ours is pretty simple, as the project is always rooted at a directory
> with a fixed name.
>
>
> On Wed, Apr 3, 2019 at 10:38 AM Eric Liu via clangd-dev <
> clangd-dev at lists.llvm.org> wrote:
>
>> Just to add on what Ilya said.
>>
>> > Note that both indexes store absolute paths, so sharing the produced
>> index across multiple machines would only be possible if the directory
>> structure is kept the same.
>> > If having the same directory structure is plausible, please try it out
>> and let us know if it works, we haven't tried sharing the same index across
>> multiple machines.
>> Paths are stored as URI in the index. By default, "file" scheme is used,
>> so URI would simply be absolute path (e.g. file:///user/home/llvm/x/y.h).
>> But you could also define your own URI schemes. For example, you can choose
>> to store relative paths in the URI (e.g. llvm:///x/y.h) in a custom scheme,
>> and they can be resolved with potentially different project roots on users'
>> machines to get correct full paths. For more information, please take a
>> look at clangd/URI.h library. You could also find some sample URIScheme
>> implementations in unit tests.
>>
>> Cheers,
>> Eric
>>
>>
>> On Wed, Apr 3, 2019 at 10:27 AM Ilya Biryukov via clangd-dev <
>> clangd-dev at lists.llvm.org> wrote:
>>
>>> Hi William,
>>>
>>> The difference between background-indexer and clangd-indexer is the
>>> layout of the output:
>>> - background-indexer would put the resulting index into the folder
>>> <project-root>/.clangd/index.
>>>   The index is split per-file, i.e. it's incremental and clangd would be
>>> able to update the files that changed after the index was built.
>>>   You would need to run clangd with '-background-index' to load the
>>> index, it will also automatically update the index for files that changed
>>> on load.
>>> - clangd-indexer would produce a *merged *index, it can't be
>>> incrementally updated and you have more control for the location of the
>>> output:
>>>   ./bin/clangd-indexer -executor=all-TUs path/to/compile_commands.json >
>>> path/to/output.riff
>>>   You would need to run clangd with '-index-file=path/to/output.riff' to
>>> load the index.
>>>
>>> Note that both indexes store absolute paths, so sharing the produced
>>> index across multiple machines would only be possible if the directory
>>> structure is kept the same.
>>> If having the same directory structure is plausible, please try it out
>>> and let us know if it works, we haven't tried sharing the same index across
>>> multiple machines.
>>>
>>> Which option to prefer? Depending on your situation, either of the two
>>> might be better:
>>> - If you always want an up-to-date index and storing the shared snapshot
>>> is just a performance optimization, use background-indexer.
>>> - If you not wasting resources to rebuild the index for changed files is
>>> more important than the fact that some results are stale (e.g. it's too
>>> expensive, you want to save laptop battery, etc.), clangd-indexer might be
>>> a better choice.
>>>
>>> Here's a short summary on what each index means:
>>> - Static index is an index that is persisted across multiple runs of
>>> clangd. There are two flavours of it:
>>>   1. Background index. Incremental (split per-file) index living in
>>> '<project-root>/.clangd/index'.  Built automatically by clangd when
>>> -background-index is specified. Long-term, we want this to be enabled by
>>> default (and possibly be the only option).
>>>   2. Old-style "merged" index produced by clangd-indexer. The results
>>> will not get updated by clangd automatically, you can ask clangd to load it
>>> with '-index-file=path/to/index.riff'.
>>> - Dynamic index is an overlay for a small number of updated files
>>> (currently the open files for which we built the AST). Kept in memory, not
>>> persisted across multiple runs. We use to adjust for the fact that static
>>> index might be stale. We want the correct results for the open files in all
>>> cases.
>>> - Dex is an efficient implementation of running search queries (e.g. it
>>> models fuzzy-matching algorithm, etc.). It's an "index" in an information
>>> retrieval sense, it is not actually specific to C++ or clangd.
>>>
>>> On Mon, Apr 1, 2019 at 6:36 PM William Wagner (BLOOMBERG/ 731 LEX) via
>>> clangd-dev <clangd-dev at lists.llvm.org> wrote:
>>>
>>>> Hello!
>>>>
>>>> I work on a fairly large C++ project and wanted to figure out a way to
>>>> regularly build (e.g. nightly via Jenkins) a global project index that can
>>>> be shared with all the members of my team. I want to share it because it
>>>> takes a fairly long time to build the index after starting up, and it seems
>>>> pretty redundant to have each team member doing so, seeing as most of the
>>>> code is not changing on a day-to-day basis. I’ve tried peeking around the
>>>> mailing lists and commit history of clangd, but I’m not sure whether this
>>>> is possible yet - and if it was, what flags to use, what indexer etc.
>>>>
>>>> I see there’s background-indexer WIP (https://reviews.llvm.org/D59605)
>>>> and an existing clangd-indexer
>>>> https://github.com/llvm-mirror/clang-tools-extra/blob/master/clangd/indexer/IndexerMain.cpp
>>>> What is the difference between these?
>>>>
>>>> Additionally, if anyone could provide some clarification on the
>>>> different types of indexes clangd currently has (dex, background, static,
>>>> etc.) that would be great :)
>>>>
>>>> Thanks!
>>>>
>>>> _______________________________________________
>>>> clangd-dev mailing list
>>>> clangd-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ilya Biryukov
>>> _______________________________________________
>>> clangd-dev mailing list
>>> clangd-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>
>> _______________________________________________
>> clangd-dev mailing list
>> clangd-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>
> _______________________________________________
> clangd-dev mailing list
> clangd-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20190404/1fca8e08/attachment.html>


More information about the clangd-dev mailing list