[PATCH] D63092: [Frontend] Use executable path when creating invocation from cmdline

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 11 01:15:18 PDT 2019


sammccall added a comment.

Thanks for pushing on this. I guess this must affect all clang-tools when used with a wrapper, which seems important. (I've just been using bazel and local cmake, so haven't hit this myself).

In D63092#1536890 <https://reviews.llvm.org/D63092#1536890>, @phosek wrote:

> > The wrapper tool case is a problem though, I see some options:
> > 
> > - change the process that generates the compilation database to emit a path to the underlying compiler instead, as tooling currently expects. This may be painful if e.g. goma pretends to be a regular toolchain. An argument can be made it should pretend harder (e.g. symlinking directories around it so discovery works as usual)
>
> I think this is the right solution for compiler wrappers like goma, the problem is that this requires changing any tool that generates compilation database to have explicit notion of compiler launcher and exclude that from the command (today we just prepend the tool to the command <https://chromium.googlesource.com/chromium/src/+/refs/heads/master/build/toolchain/gcc_toolchain.gni#171>). This includes tool like GN and potentially even Ninja which is a non-trivial task. Until that happens, `clangd` is unusable for projects like Fuchsia (`clangd` cannot find standard headers and we're getting too many warnings for those)


Yeah, I can see how putting this in ninja would be a big layering violation. One of the nice things about compile_commands.json is you can get it as a side-effect.

>From first principles, a CDB can't hold an arbitrary command that happens to spawn clang, tools need some way to inspect it. We do that by requiring it to be a command-line the driver can parse, which works for clang, gcc, cl, and tools that aim to be CLI-compatible with these. So the issues here are that:

- `gomacc clang++ $flags` isn't CLI-compatible with `clangd++ $flags`: **both** argv[0] and argv[1] are wrong
- argv[0] isn't just a string, it's a location to poke around in for include paths
- [potentially] the driver itself has hard-coded paths in it (see D62804 <https://reviews.llvm.org/D62804>)

One possible fix for the first issue is to invoke is `goma-clang++` rather than `gomacc clang`. Or alternatively `goma-clang --driver-mode=g++` which would avoid requiring lots of executables.

Another would be teaching CDB (or createinvocationfromcommandline) to parse out toolchain wrappers. Obviously this seems like a bit of a can of worms. It's (probably?) fine when it's a single argv0 that matches some regex and argv1 is "clang". But what if the wrapper has a generic name, or takes flags, or argv1 is "some-weird-cc"...

The second issue is harder because I think the wrapper makes it fundamentally unclear where to scan for libraries. One answer is near the wrapper itself, but it sounds like goma's answer is that 1) you're likely to have an appropriately configured compiler locally, and 2) it's probably on your PATH. Do you have a sense of how safe these assumptions are?

I'm leaning toward having some logic in JSONCompilationDatabase that pattern matches argv0, and for known wrappers does some transform on argv and sets a new "wrapper" field on the CompileCommand struct. WDYT?

>> - change the process that generates the CDB to inject the relevant directory paths explicitly as flags
> 
> This would mean either duplicating the Clang driver logic inside the build system, which is something I'd like to avoid (why use the driver at all in that case?), or have some mechanism through which the driver would have to export all the necessary information (e.g. where to find standard library headers) for the build system to use.
> 
>> - make driver aware of the distinction between "invoked-as" and "current binary" and use the right path in the right places
> 
> I'll look into this.

Thinking about this more - we'd still have the argv[1] issue, and for headers we probably want to search near `findProgramByName(argv[1])` rather than near `getMainExecutable()` (at least clangd doesn't package a standard library). And I do think maybe this is something for CDB to handle rather than driver.

>> There are some potentially-unsolvable cases here: e.g. if your wrapper is `magic-remote-build` then maybe there's no standard library locally, or no way to find it.
> 
> Do you think it's worth updating the compilation database documentation <https://clang.llvm.org/docs/JSONCompilationDatabase.html> to explicitly say that the command has to start with the compiler invocation? Currently it says that "The compile command executed." which is IMHO not sufficiently precise.

Yeah, I think it should say that it needs to be a command clang's driver understands, which includes typical invocations of `gcc`, `clang++`, `clang-cl`, etc. And if we add wrapper support (somewhere) we should mention that too.

In D63092#1536937 <https://reviews.llvm.org/D63092#1536937>, @phosek wrote:

> One more thing, do you think it's reasonable to use `llvm::sys::findProgramByName(Args[0])`instead of `Args[0]` when creating the driver instance? One of the failure modes I ran into is the case where the generated compilation database would contain just the executable name, e.g. `clang++`. If you invoke that command, everything works as expected because driver resolves the binary to a full path <https://github.com/llvm/llvm-project/blob/master/clang/tools/driver/driver.cpp#L58>, but when used with `clangd`, it'll fail because that resolution will never happen, the `Dir`/`InstalledDir` will be an empty path and attempt to resolve C++ library headers will end up with `/../include/c++/v1` which is invalid.


Yes, I think you're right. We need to make sure we only do this for bare names (`clang`, not `../clang`) and that we keep the current behavior if findProgramByName fails.
There's a chance this will change behavior in some cases where we're using a VFS and don't want PATH or the real FS to have any effect, but I think we should try it and see.

> Alternative would be to make the compilation database specification even more strict and require that the compiler command is not only the first argument, but it's also a (full or relative) path to the compiler executable.

Changing documentation is obviously going to have a limited, delayed effect. Let's try other approaches first?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63092/new/

https://reviews.llvm.org/D63092





More information about the cfe-commits mailing list