[clangd-dev] Clangd Path Mappings for Docker Container

Fri Jun 28 05:40:50 PDT 2019

On Fri, Jun 28, 2019 at 4:26 AM William Wagner <wcwwagner at gmail.com> wrote:

> Hey Sam,
>
> Thanks for the prompt response, I’m glad to hear it’d be useful in
> upstream! I agree about the JSON/wrapping part, and from looking around the
> code, the implementation seems doable on my end. If I understand this
> right, we should have something like:
>
>    1.
>
>    A new string pathMappings command-line argument
>
> This part is actually the least obvious to me:
 - is it better to put the mappings "inline" or in a file which can be
referenced?
 - should this ultimately be global (per-user) configuration, or
per-project somehow? (Flags are basically the only mechanism we have *for
now*, but we may change this in future)
 - if the client and/or host use windows paths, is this going to cause any
problem/ambiguity?

>
>    1.
>
>    A new WrapTransport class, that “owns” the XPC/JSON transport and has
>    the path mappings stored internally. All calls to e.g. notify will simply
>    do the path mappings and then delegate the call to the normal wrapped
>    transport method.
>    2.
>
>    A new WrapMessageHandler class (for inbound messages) that, similar to
>    WrapTransport, does the path mapping translation and then delegates calls
>    to the normal message handler.
>
> This sounds good. I think the actual WrapTransport/WrapMessageHandler
classes can be hidden from the public interface though.
I'd suggest something this code structure to keep this feature
isolated/compact and facilitate testing:

PathMapping.h
   class PathMapping {
     static Expected<ParseMapping> parse(string);
     string mapIncoming(string);
     string mapOutgoing(string);
   }
   unique_ptr<Transport> createPathMappingTransport(unique_ptr<Transport>,
PathMapping);

Then we can unit test the parse/string mapping functions in detail without
much overhead.
The transport *could* be unit-tested too (to cover all the ways JSON
objects are sent through a transport), but I'm not actually sure it's
necessary.

We should test the overall feature with an end-to-end lit test (like the
ones in clangd/test/*). Those tests are awkward so it should just be a
smoke test - verify at least one input and one output filename, but don't
try to cover every case.

>
> Please let me know if I didn’t get that right! Bear with me, I still have
> a couple questions:
>
>
>    1.
>
>    Should input validation be done on the path mappings argument? If so,
>    if it isn’t well-formed, should clangd exit, log it and ignore, etc.?
>
> I think we should require the mappings to be syntactically valid (and exit
if parsing fails), but the mapped paths need not exist.
A syntactically invalid flag indicates the user wanted to do something and
messed it up, whereas mappings pointing to nonexistent directories are
probably just old and safely ignored.

>
>    1.
>
>    What kind of tests should be written here? Some unit tests around
>    parsing the path mappings, checking to make sure the wrapping transport is
>    mapping/delegating properly seem prudent, although I’m not sure how to
>    write any higher level ones than that.
>
> The highest level tests are the lit tests in test/*, we should definitely
have one.
As I mentioned above, I'm not sure there'll be much interesting to test in
the WrappingTransport that isn't covered by the lit tests, so you could
skip those if you like.

One hassle is that you're going to be testing exact paths and this is
annoying in the lit environment. I'd suggest "UNSUPPORTED: win32" (leaning
on unit test coverage) and then you can mostly use file://UNIX_PATH where
UNIX_PATH gets substituted in via sed.
See background-index.test for an example. Happy to help with more details
in code review.

>
>    1.
>
>    Are there special instructions to build/run the tests? I was hoping
>    https://llvm.org/docs/TestingGuide.html would steer me in the right
>    direction :)
>
> Sorry about that. The target you want is "check-clangd" (e.g. ninja
check-clangd). This builds clangd and the unit tests, and runs both the lit
tests and unit tests.

You can rerun a single lit test with "ninja clangd && bin/llvm-lit
$SOURCE/clangd/test/foo.test". You can rerun a single unit test with "ninja
ClangdTests && tools/clang/tools/extra/clangd/unittests/ClangdTests
--gtest_filter='WhateverTest.Foo'"

Good luck, please ask if you need a hand!

Cheers, Sam

>
> Thanks,
>
> William
>
>
> On Thu, Jun 27, 2019 at 2:29 AM Sam McCall <sammccall at google.com> wrote:
>
>> It would be great to get this feature upstreamed in some form.
>>
>> From a pragmatic point of view, the JSON level seems by far the least
>> invasive way to do this. (It's a shame the VFS approach didn't work out)
>>
>> We have the Transport abstraction (clangd/Transport.h) which should cut
>> across all JSON messages sent over stdio (because sometimes they're sent
>> over XPC instead!)
>>
>> I think it would be possible to write a wrapping Transport implementation
>> that does path translation and delegates to another. WDYT?
>>
>> On Thu, Jun 27, 2019, 1:47 AM William Wagner <wcwwagner at gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> Sorry for the super long delay here! I’ve created a fork of upstream
>>> clangd that implements this path mapping behavior, and have been using it
>>> for the past couple months. As stated previously, the primary environment
>>> I’m using this in is clangd running inside a docker container, the editor
>>> running on the host, and the source files sync'ed between the two (via a
>>> volume mount).
>>>
>>> I went with the most straightforward implementation (to me), which is
>>> editing the JSON messages directly, both for inbound and outbound messages.
>>> Basically, I’ve added the following:
>>>
>>>    1.
>>>
>>>    A new --pathMappings string argument, which is of the form
>>>    <host_path>:<remote_path>;<host_path>:<remote_path>...
>>>    2.
>>>
>>>    Logic to recursively loop through all the values in the JSON RPC
>>>    messages, doing path substitution on all values that start with “file://”
>>>    3.
>>>
>>>    Calling this logic at all inbound locations (i.e.
>>>    MessagesHandler::onNotify/onCall, etc.) and all outbound locations (i.e.
>>>    ReplyOnce::Operator(), ClangdLSPServer::publishDiagnostics, etc)
>>>
>>>
>>> This is something that works and is pretty useful to my team (and I hope
>>> others too). Does this sound reasonable to try and land in clangd? I’d be
>>> happy to code it up, but would just want some confirmation before going
>>> through the effort to do it properly.
>>>
>>> Assuming it’s useful, I have some questions for the ideal implementation:
>>>
>>>    1.
>>>
>>>    Should the path mappings be done on JSON values or the Protocol.h
>>>    data structures? If it’s the latter, I suppose we could create something
>>>    like `template <typename T> T doFilePathMappings(const T& Orig)`, and
>>>    specialize the implementation for the protocol objects that have URIs in
>>>    them.
>>>    2.
>>>
>>>    Is there / should there be a centralized place to put this path
>>>    mapping logic? Right now, it seems there's a few different places it could
>>>    go, but polluting the codebase in many different places seems far from
>>>    ideal. Naturally I reached for the ClangdLSPServer::MessagesHandler, but in
>>>    its current implementation it seems to mostly operate on JSON Values, and
>>>    even then some replies bypass the MessagesHandler (e.g.
>>>    publishDiagnostics/onFileUpdated)
>>>
>>>
>>> Happy to hear any feedback!
>>>
>>> Thanks for taking the time,
>>>
>>> William
>>>
>>>
>>> On Wed, Jan 16, 2019 at 8:55 AM Ilya Biryukov <ibiryukov at google.com>
>>> wrote:
>>>
>>>> The idea that I was playing around with was remapping the paths between
>>>> network mount and a Clangd running on the remote machine.
>>>> I actually wrote an LSP wrapper that would update all the paths used
>>>> along the way. It worked, but using it was a terrible pain. (Offtopic:
>>>> network mounts are a terrible idea, rsyncing the code gives a so much
>>>> better latency!)
>>>>
>>>> As ugly as this sounds, I actually think that having an option in
>>>> clangd to remap the paths is the preferred option here. It's feels like the
>>>> only way that has a reasonable UX.
>>>> Just to make it clear, your setup is as follows:
>>>> - A development environment (where you run the builds, etc) is
>>>> configured inside a docker image.
>>>> - The source code lives inside a folder on a host OS.
>>>> - The source code folder is mounted into the docker container.
>>>> - The editors run in the host OS, the language server runs inside the
>>>> docker container.
>>>>
>>>>
>>>> On Thu, Jan 10, 2019 at 6:01 PM Sam McCall <sammccall at google.com>
>>>> wrote:
>>>>
>>>>> Hi William,
>>>>>
>>>>> This is an idea Ilya did some thinking about before. The motivation
>>>>> was slightly different (editor on laptop, clangd on remote workstation) but
>>>>> I believe equivalent.
>>>>> We're not going to get around needing to have all sources on both
>>>>> machines (or mounted from one to the other). Clangd and the editor both
>>>>> need to see them. So I think the path translation is indeed the bit to
>>>>> attack. The scary thing about adding file mapping is doing it consistently
>>>>> everywhere without making mistakes or adding lots of noise to the code.
>>>>>
>>>>> An idea I want to throw out there: clangd performs reads via a VFS.
>>>>> This is an existing abstraction where path translation could be added
>>>>> without any additional complexity to clangd proper.
>>>>> The way this would work: we give clangd a different view of the world
>>>>> and connect it directly to the editor (on the host). This means clangd
>>>>> needs to speak the host's language, so the VFS needs to simulate the FS
>>>>> layout of the host.
>>>>> This leads to the main limitation: we need a compilation database
>>>>> that's consistent with the *host's* file layout. I suspect mechanically
>>>>> transforming compilation databases is hard (relative paths etc), so this
>>>>> may mean running the build system on the host (at least sufficiently to
>>>>> generate the CDB).
>>>>> Not sure if there's a way to modify this idea to avoid the limitation.
>>>>>
>>>>> I'm interested in how else we can get clangd to expose a different
>>>>> view of the world than its VFS sees in a consistent way, too!
>>>>>
>>>>> On Thu, Jan 10, 2019 at 9:12 AM William Wagner via clangd-dev <
>>>>> clangd-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I’ve been using clangd for developing a C++ project at work. Our
>>>>>> development setup requires that we build and run our code inside a docker
>>>>>> container. From a language server perspective this presents a few
>>>>>> challenges:
>>>>>>
>>>>>>
>>>>>>    1. Communicating with the language server running in the container
>>>>>>    2. Synchronizing the files between the host and the container
>>>>>>    3. Translating file paths between the host lsp client  and docker
>>>>>>    lsp server
>>>>>>    4. Having dependency files that are only in the container
>>>>>>    available to view on the host (for textDocument/definition on dependencies
>>>>>>    installed only in the container)
>>>>>>
>>>>>>
>>>>>> Fortunately, docker’s facilities make 1) and 2) pretty easy to deal
>>>>>> with.
>>>>>>
>>>>>> 1) can be solved by creating a bash script that contains something
>>>>>> similar to:
>>>>>> `docker exec -i <build_container> /path/to/clangd ${@:1}`
>>>>>> and pointing clangd.path to this script
>>>>>>
>>>>>> 2) can be solved by using a docker bind mount, which maps the
>>>>>> directory of the project you’re working on into the container
>>>>>>
>>>>>> 3) can be solved by mounting the host:/path/to/project to an
>>>>>> identical path in the container. Then there doesn't have to be any
>>>>>> translation between the requests/responses between the file uris that the
>>>>>> client and server send back and forth. However, this may not always be
>>>>>> possible and still leaves the problem of external dependency paths.
>>>>>>
>>>>>> 4) has no easy solution yet afaik. A potentially simple one would be
>>>>>> to copy just the dependency headers from the container over to some
>>>>>> directory on the host, and be able to translate clangd response file uris
>>>>>> to the appropriate path on the host. The way I envision that is an extra
>>>>>> arg to the clangd server that supports path substitution, such as:
>>>>>>
>>>>>> `clangd
>>>>>> -path-mappings=’/host/home/project/deps:/container/deps;/host/home/project/:container/project`
>>>>>>
>>>>>> So if I “Go To Definition” on an external symbol e.g. Foo, which is
>>>>>> defined in /container/deps/foo.hh”, and I have the foo.hh header in
>>>>>> /host/home/project/.deps/foo.hh, then the response from the servers
>>>>>> textDocument/definition would substitute the /container/deps/foo.hh →
>>>>>> /host/home/project/.deps/foo.hh and all would be well.
>>>>>>
>>>>>> I believe sourcegraph has implemented an extension to the LSP to
>>>>>> handle missing files on the client/server, but I cannot find the proposed
>>>>>> spec. Also these are some github discussions that touch on the topics above:
>>>>>>
>>>>>>    - Ccls supports a path Mappings config option that allows
>>>>>>    substitution, see https://github.com/MaskRay/ccls/issues/75
>>>>>>    - Discussion on remote langauge server proposal
>>>>>>    https://github.com/Microsoft/language-server-protocol/issues/528
>>>>>>
>>>>>>
>>>>>> I’d be happy to implement this path mapping functionality, but
>>>>>> thought it’d be prudent to see what everyone else thinks before doing so.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> clangd-dev mailing list
>>>>>> clangd-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ilya Biryukov
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20190628/46726336/attachment-0001.html>