[PATCH] D62804: [clangd] Enable extraction of system includes from custom toolchains

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 6 13:20:43 PDT 2019


sammccall added a comment.

Implementation looks good. I can't see a better way to solve this problem, it's just a bit unfortunate to have a sophisticated solution but not be able to turn it on by default.

I think naming is important here: it's a fairly complicated feature that (I suppose) relatively few will use, so having an unambiguous way to refer to it e.g. in docs will reduce the cost/confusion.
I suggested "QueryDriver" below, but we might be able to come up with something better offline.



================
Comment at: clang-tools-extra/clangd/ClangdServer.h:127
+
+    /// If set clangd will execute compiler drivers matching the following regex
+    /// to fetch system include path.
----------------
not a regex


================
Comment at: clang-tools-extra/clangd/ClangdServer.h:127
+
+    /// If set clangd will execute compiler drivers matching the following regex
+    /// to fetch system include path.
----------------
sammccall wrote:
> not a regex
I think this probably wants to be a vector<string> (comma-separated on command-line)?

Clangd-level settings aren't easy to customize per-project.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:65
 
+std::vector<std::string> parseDriverOutput(llvm::StringRef Output) {
+  std::vector<std::string> SystemIncludes;
----------------
Can we move the implementation out to a separate file?
It's pretty logically separate, and about as much code as everything else together.
It could also use a file-level comment describing how the scheme works, and also the reasons it's not always on (security, and not all argv0's are gcc-compatible)
(Sharing the same header seems fine)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:71
+  Output.split(Lines, '\n', /*MaxSplit=*/-1, /*KeepEmpty=*/false);
+  bool FoundStart = false;
+  for (llvm::StringRef Line : Lines) {
----------------
I'd consider `std::search` for the delimiters explicitly - I think it's clearer what happens when you find one but not the other.

(And you probably want to log an error and do nothing unless you find both)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:82
+
+    if (!llvm::sys::fs::exists(Line))
+      elog("Parsed non-existing system include: {0}", Line);
----------------
Is this check important? (What happens if non-existent dirs are on the include path?)

If it is, maybe we should pass in the VFS (factory?) here.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:90
+
+std::vector<std::string> extractSystemIncludes(PathRef Driver, PathRef File) {
+  trace::Span Tracer("Extract system includes");
----------------
some explanation of what this is doing?

e.g. "Many compilers have default system includes... these are known by the driver and passed to cc1 ... gcc and compilers with compatible CLI can dump the cc1 invocation information with this syntax... we do so and parse the result."


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:96
+  llvm::SmallString<128> OutputPath;
+  const auto EC = llvm::sys::fs::createTemporaryFile("system-includes",
+                                                     "clangd", OutputPath);
----------------
nit: please don't bother with const on locals (style is debatable, but it's hardly used anywhere and adds little value unless used consistently)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:96
+  llvm::SmallString<128> OutputPath;
+  const auto EC = llvm::sys::fs::createTemporaryFile("system-includes",
+                                                     "clangd", OutputPath);
----------------
sammccall wrote:
> nit: please don't bother with const on locals (style is debatable, but it's hardly used anywhere and adds little value unless used consistently)
we need to remove this file at some point


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:99
+  if (EC) {
+    elog("Couldn't create temporary file: {0}", EC.message());
+    return {};
----------------
log messages should contain more context, e.g. "Driver flags extraction: failed to create temporary file"


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:108
+  // Should we also preserve flags like "-sysroot", "-nostdinc" ?
+  const llvm::StringRef Args[] = {"-E", "-x", driver::types::getTypeName(Type),
+                                  "-", "-v"};
----------------
this will crash if the lookup didn't provide a "real" type


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:112
+  const int RC =
+      llvm::sys::ExecuteAndWait(Driver, Args, /*Env=*/llvm::None, Redirects);
+  if (RC) {
----------------
should we check the driver exists before executing it? the main advantage would be that we avoid logging this as if it were an error (or we do so with a better error message)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:125
+
+  return parseDriverOutput(BufOrError->get()->getBuffer());
+}
----------------
I think we should log the success case too... remember this is only going to happen once per driver (or {driver, filetype}).


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:129
+tooling::CompileCommand
+addSystemIncludes(tooling::CompileCommand Cmd,
+                  llvm::ArrayRef<std::string> SystemIncludes) {
----------------
just take cmd by reference and modify it?


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:140
+llvm::Regex convertGlobToRegex(llvm::StringRef Glob) {
+  const llvm::StringRef MetaChars("()^$|*+?.[]\\{}");
+  std::string RegText;
----------------
please use llvm::Regex::escape() on the chunks between `*`, so we're sure to be consistent


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:178
+            convertGlobToRegex(TrustedCompilerDriverGlob)),
+        Base(std::move(Base)) {
+    assert(this->Base && "Base cannot be null!");
----------------
I think you need to subscribe to changes in the base and broadcast them to your own subscribers.
As it stands this breaks background indexing I think.
This pattern is error prone, as it's too easy to forget :-(


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:190
+
+    llvm::StringRef Driver = Cmd->CommandLine.front();
+
----------------
check the CommandLine isn't empty? (I know, silly...)


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:190
+
+    llvm::StringRef Driver = Cmd->CommandLine.front();
+
----------------
sammccall wrote:
> check the CommandLine isn't empty? (I know, silly...)
The flag values will be absolute paths (or must be, to make any security sense), so we need to make Driver absolute before checking against it (using the command's working directory). And we need to make sure the string we exec is the same as the one we check, so that needs to be absolute too.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:195
+      std::lock_guard<std::mutex> Lock(Mu);
+      if (!TrustedCompilerDriverRegex.match(Driver))
+        return Cmd;
----------------
regex check is part of the cacheable computation.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:198
+
+      auto It = DriverToIncludes.try_emplace(Driver);
+      if (It.second)
----------------
you're computing based on the filename, but caching based only on the driver.

If the filename is important, I think we should probably restructure so you get the filetype here and cache by it instead.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.cpp:210
+  // Caches includes extracted from a driver.
+  mutable llvm::StringMap<std::vector<std::string>> DriverToIncludes;
+  mutable llvm::Regex TrustedCompilerDriverRegex;
----------------
put "cache" in the name?


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.h:96
+/// the compile flags.
+/// Returns nullptr when \p TrustedCompilerDriverGlob is empty or \p Base is
+/// nullptr.
----------------
If the trusted list is empty, it seems reasonable just to return Base. Then the caller can skip the check.


================
Comment at: clang-tools-extra/clangd/GlobalCompilationDatabase.h:96
+/// the compile flags.
+/// Returns nullptr when \p TrustedCompilerDriverGlob is empty or \p Base is
+/// nullptr.
----------------
sammccall wrote:
> If the trusted list is empty, it seems reasonable just to return Base. Then the caller can skip the check.
if null base is forbidden (seems to be the intent here), just say so, don't specify what happens in that case (and probably add an assert)


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:271
 
+static llvm::cl::opt<std::string> TrustedCompilerDriverGlob(
+    "trusted-compiler-driver-glob",
----------------
I'm not sure this is the right name, because "trusted" doesn't indficate what the feature does or when you might need it. And we've mitigated most of the security risk by turning this off by default, I'm not sure we actually need a warning in the flag name.

Maybe `-query-driver=`?

(commenting here because it's the main user-visible documentation, but I think we should use consistent names throughout, e.g. QueryDriverGlob).


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:273
+    "trusted-compiler-driver-glob",
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * "
----------------
nit: drop "tells clangd to", it's implied


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:273
+    "trusted-compiler-driver-glob",
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * "
----------------
sammccall wrote:
> nit: drop "tells clangd to", it's implied
from gcc-compatible drivers


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:274
+    llvm::cl::desc("Tells clangd to extract system includes from drivers "
+                   "maching the glob. Only accepts ** for sequence match and * "
+                   "for non-sequence match."),
----------------
I think it's more useful to give an example (/usr/bin/**/clang-*) than try to describe glob syntax.


================
Comment at: clang-tools-extra/clangd/tool/ClangdMain.cpp:276
+                   "for non-sequence match."),
+    llvm::cl::init(""));
+
----------------
is there something useful we should set this to by default? like `/usr/bin/gcc,/usr/bin/g++`?

Or is the assumption that the standard/system compilers will never have weird information to extract?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62804/new/

https://reviews.llvm.org/D62804





More information about the cfe-commits mailing list