[PATCH] D148663: [RFC][clangd] Use interpolation for CDB pushed via LSP protocol

Fri Apr 28 07:30:32 PDT 2023

DmitryPolukhin added a comment.

In D148663#4305202 <https://reviews.llvm.org/D148663#4305202>, @ilya-biryukov wrote:

> In D148663#4301589 <https://reviews.llvm.org/D148663#4301589>, @DmitryPolukhin wrote:
>
>> And, if they do so, this diff will just does nothing because there is an exact match. It starts playing only if client provided partial CDB and inference is required.
>
> (hypothesising below)
> I think this depends on a client at question. If I were writing one and had an idea what I want to do for headers, e.g. I might have a project system that knows which build targets headers belong to,
> I would rather see no compile errors for a header (if I have bugs in my integration) than for Clangd to kick in with interpolation and silently hide the bug.
>
> From what I can recollect, your case is different and you just want to "pipe" `compile_commands.json` to Clangd provided by some build system, but without `clangd` actually reading it.
> I don't actually write an LSP client, though, would let @kadircet decide which path is preferable.

We have LSP client that works like a proxy and exposes LSP protocol to higher level. Now we do very deep processing of CDB and partially replicates logic clangd about command inference.
Clangd usually does good job in command inference and we would like to use this feature instead of keep our logic up-to-date. I can put this inference logic for LSP behind some command line flag
if you think that it might really break some good use case. Please let me know.

>> Pushing command via LSP might be preferable in comparison with file based approach to avoid race condition with updating the file and clangd reading it
>
> Ideally this should be handled with atomic writes (write to temp file and move to destination), at least on Linux and Mac. I'm not sure if build systems do that, though.
> Does Clangd ever re-read the compilation database now? It used to load it only once, so rereading would require a restart of clangd anyway. Race on just one read is very unlikely (although not impossible).
> However, the `compile_commands.json` file is parsed lazily, when the command for a particular file is actually requested, if you see crashes or inconsistencies in your scenarios, it highly likely due to this.

Yes, clangd re-read CDB in some cases but it won't try to read CDB again from a directory if there was none before for performance reasons I think. I think synchronisation here is hard we cannot control when
clangd will actually read CDB so it might use wrong flags. Also we run multiple clangd behind multiplexing proxy and they might need different CDBs. Also CDB tends to become large and hard to manage so
per-file LSP protocol has lots of advantages for us.

>> + it works better with build systems that can generate compiles commands on the fly for files and generating all of them in advance may not be possible.
>
> FYI, there is separate support for pushing command per-file in Clangd via an LSP extension <https://github.com/llvm/llvm-project/blob/fae9d4df8fcff802dc7b0c344c61c70cc2a20697/clang-tools-extra/clangd/Protocol.h#L577>.
> I think this should cover build systems that generate commands on the fly better, but this also does not use interpolation.

It is exactly the LSP protocol we are using and I added inference.

> A meta-comment: it would be really useful to understand how you use Clangd a bit better. Would it be hard to write down which kind of client you have and what are requirements for Clangd? Which build systems are at play, how does the client use them, etc?
> To validate whether patches like this one are a good fit to solve the problem or there are better ways, it's really useful to have a higher level overview of how Clangd is being used.

We use Buck <https://buck.build/> and Buck2 <https://buck2.build/> as our main build system + some projects uses other build systems like CMake, and in general we don't limit build systems that subproject might use.
Therefore we cannot limit ourself to any particular clangd version and had to support multiple of them simultaneously because individual projects might might use incompatible features. So we have a LSP multiplexing proxy that 
encapsulates build system specifics and combines results from several clangds. Changes in clangd that we would like to put in upstream in our understanding should be usable not only in our setup but should also improve clangd for all users.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148663/new/

https://reviews.llvm.org/D148663