[PATCH] D23455: [Tooling] Parse compilation database command lines properly on Windows

Alexander Kornienko via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 17 01:42:26 PDT 2016


alexfh added a comment.

In https://reviews.llvm.org/D23455#516760, @zturner wrote:

> In https://reviews.llvm.org/D23455#516753, @alexfh wrote:
>
> > In https://reviews.llvm.org/D23455#515527, @rnk wrote:
> >
> > > In https://reviews.llvm.org/D23455#515486, @brad.king wrote:
> > >
> > > > > the feasibility of emitting 'arguments' instead of 'command' into the JSON compilation database.
> > > >
> > > >
> > > > CMake constructs the command lines internally using string replacement on templates.  We never actually know the exact arguments.  Therefore providing arguments instead of the whole command would require parsing to be done on the CMake side instead.  This is theoretically possible because we do know the shell for which we are generating (Windows `cmd` versus MSYS `sh`).  However, it may also require a bunch of logic we don't have yet but that LLVM does.
> > > >
> > > > Alternatively, the JSON could have an additional `command_shell="..."` field that indicates the shell for which the command line is encoded.
> > >
> > >
> > > Bummer. Given that this is hard to do in CMake, then I think we should just tokenize in Clang. Let's use llvm::sys::getProcessTriple() instead of LLVM_ON_WIN32 and check if that is an MSVC environment as a proxy for the shell type.
> >
> >
> > Do I understand correctly that `llvm::sys::getProcessTriple()` returns basically a compile-time constant? If yes, it won't allow the same clang tool binary to be used with both command line formats. Should we instead allow runtime selection of the command line format? For example, by extending JSON compilation database with the aforementioned `command_shell="..."` option.
>
>
> It does return a compile time constant, but it is a compile time constant representing the platform that clang-tidy is running on.  I don't see a need to have a single clang-tidy binary be able to parse both command line formats.  In fact, it seems actively harmful.
>
> If you are running on a Windows system with windows command line parsing rules, and someone has generated a command line on linux, then this command line was never intended to be run on Windows in the first place.  The JSON compilation database spec already says that the command line represents "the exact command to be run **in the environment of the build system**".  By adding the flexibility to change the environment, this is no longer true.  If I try to run a command generated for one build system in the environment of another build system, even if I tokenize it correctly whose to say it will work?
>
> I understand the desire to make sure we get the right change so we don't have to revisit this in the future, but to me this sounds like YAGNI and actually increasing the risk of someone using the software and getting unexpected results, not less risk.


Now I'm less sure. And I might be misunderstanding something here. Folks actually using clang tools on Windows (Aaron?) can tell with more confidence whether Linux-style command line tokenization is of any use on Windows. I had an impression that it allowed clang tools to be used with mingw (and msys shell), but I'm not sure whether it's an important use case for anyone.


https://reviews.llvm.org/D23455





More information about the cfe-commits mailing list