r365887 - [JSONCompilationDatabase] Strip distcc/ccache/gomacc wrappers from parsed commands.

Sam McCall via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 12 03:11:40 PDT 2019


Author: sammccall
Date: Fri Jul 12 03:11:40 2019
New Revision: 365887

URL: http://llvm.org/viewvc/llvm-project?rev=365887&view=rev
Log:
[JSONCompilationDatabase] Strip distcc/ccache/gomacc wrappers from parsed commands.

Summary:
It's common to use compiler wrappers by setting CC="gomacc clang++".
This results in both args appearing in compile_commands.json, and clang's driver
can't handle this.

This patch attempts to recognize this pattern (by looking for well-known
wrappers) and dropping argv0 in this case.

It conservatively ignores other cases for now:
 - wrappers with unknown names
 - wrappers that accept -flags
 - wrappers where the compiler to use is implied (usually cc or gcc)

This is done at the JSONCompilationDatabase level rather than somewhere more
fundamental, as (hopefully) this isn't a general conceptual problem, but a messy
aspect of the nature of the ecosystem around compile_commands.json.
i.e. compilation databases more tightly tied to the build system should not have
this problem.

Reviewers: phosek, klimek

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64297

Modified:
    cfe/trunk/lib/Tooling/JSONCompilationDatabase.cpp
    cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp

Modified: cfe/trunk/lib/Tooling/JSONCompilationDatabase.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/JSONCompilationDatabase.cpp?rev=365887&r1=365886&r2=365887&view=diff
==============================================================================
--- cfe/trunk/lib/Tooling/JSONCompilationDatabase.cpp (original)
+++ cfe/trunk/lib/Tooling/JSONCompilationDatabase.cpp Fri Jul 12 03:11:40 2019
@@ -256,15 +256,57 @@ JSONCompilationDatabase::getAllCompileCo
   return Commands;
 }
 
+static llvm::StringRef stripExecutableExtension(llvm::StringRef Name) {
+  Name.consume_back(".exe");
+  return Name;
+}
+
+// There are compiler-wrappers (ccache, distcc, gomacc) that take the "real"
+// compiler as an argument, e.g. distcc gcc -O3 foo.c.
+// These end up in compile_commands.json when people set CC="distcc gcc".
+// Clang's driver doesn't understand this, so we need to unwrap.
+static bool unwrapCommand(std::vector<std::string> &Args) {
+  if (Args.size() < 2)
+    return false;
+  StringRef Wrapper =
+      stripExecutableExtension(llvm::sys::path::filename(Args.front()));
+  if (Wrapper == "distcc" || Wrapper == "gomacc" || Wrapper == "ccache") {
+    // Most of these wrappers support being invoked 3 ways:
+    // `distcc g++ file.c` This is the mode we're trying to match.
+    //                     We need to drop `distcc`.
+    // `distcc file.c`     This acts like compiler is cc or similar.
+    //                     Clang's driver can handle this, no change needed.
+    // `g++ file.c`        g++ is a symlink to distcc.
+    //                     We don't even notice this case, and all is well.
+    //
+    // We need to distinguish between the first and second case.
+    // The wrappers themselves don't take flags, so Args[1] is a compiler flag,
+    // an input file, or a compiler. Inputs have extensions, compilers don't.
+    bool HasCompiler =
+        (Args[1][0] != '-') &&
+        !llvm::sys::path::has_extension(stripExecutableExtension(Args[1]));
+    if (HasCompiler) {
+      Args.erase(Args.begin());
+      return true;
+    }
+    // If !HasCompiler, wrappers act like GCC. Fine: so do we.
+  }
+  return false;
+}
+
 static std::vector<std::string>
 nodeToCommandLine(JSONCommandLineSyntax Syntax,
                   const std::vector<llvm::yaml::ScalarNode *> &Nodes) {
   SmallString<1024> Storage;
-  if (Nodes.size() == 1)
-    return unescapeCommandLine(Syntax, Nodes[0]->getValue(Storage));
   std::vector<std::string> Arguments;
-  for (const auto *Node : Nodes)
-    Arguments.push_back(Node->getValue(Storage));
+  if (Nodes.size() == 1)
+    Arguments = unescapeCommandLine(Syntax, Nodes[0]->getValue(Storage));
+  else
+    for (const auto *Node : Nodes)
+      Arguments.push_back(Node->getValue(Storage));
+  // There may be multiple wrappers: using distcc and ccache together is common.
+  while (unwrapCommand(Arguments))
+    ;
   return Arguments;
 }
 

Modified: cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp?rev=365887&r1=365886&r2=365887&view=diff
==============================================================================
--- cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp (original)
+++ cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp Fri Jul 12 03:11:40 2019
@@ -370,6 +370,30 @@ TEST(findCompileArgsInJsonDatabase, Find
   EXPECT_EQ("command4", FoundCommand.CommandLine[0]) << ErrorMessage;
 }
 
+TEST(findCompileArgsInJsonDatabase, ParsesCompilerWrappers) {
+  std::vector<std::pair<std::string, std::string>> Cases = {
+      {"distcc gcc foo.c", "gcc foo.c"},
+      {"gomacc clang++ foo.c", "clang++ foo.c"},
+      {"ccache gcc foo.c", "gcc foo.c"},
+      {"ccache.exe gcc foo.c", "gcc foo.c"},
+      {"ccache g++.exe foo.c", "g++.exe foo.c"},
+      {"ccache distcc gcc foo.c", "gcc foo.c"},
+
+      {"distcc foo.c", "distcc foo.c"},
+      {"distcc -I/foo/bar foo.c", "distcc -I/foo/bar foo.c"},
+  };
+  std::string ErrorMessage;
+
+  for (const auto &Case : Cases) {
+    std::string DB = R"([{"directory":".", "file":"/foo.c", "command":")" +
+                     Case.first + "\"}]";
+    CompileCommand FoundCommand =
+        findCompileArgsInJsonDatabase("/foo.c", DB, ErrorMessage);
+    EXPECT_EQ(Case.second, llvm::join(FoundCommand.CommandLine, " "))
+        << Case.first;
+  }
+}
+
 static std::vector<std::string> unescapeJsonCommandLine(StringRef Command) {
   std::string JsonDatabase =
     ("[{\"directory\":\"//net/root\", \"file\":\"test\", \"command\": \"" +




More information about the cfe-commits mailing list