[cfe-dev] Getting involved with Clang refactoring

David Röthlisberger david at rothlis.net
Thu May 24 01:08:33 PDT 2012


On 22 May 2012, at 15:17, Douglas Gregor wrote:
> Bringing it back to 'make' a little bit... we could, conceivably, have a compilation database implicitly generated from the makefiles. If one asked it how to build 'foo.cpp', it would find the appropriate make rule and form the command-line arguments. We don't have such a 'live' compilation database right now, but it fits into the model and would be really, really cool because it would allow us to 'just work' on a makefile-based project. Unfortunately, it amounts to re-implementing 'make' :(
> 
> There are other ways we could build compilation databases. There's CMake support for dumping out a compilation database; we could also add a -fcompilation-database=<blah> flag that creates a compilation database as the result of a build, which would work with any build system. That would also be a nice little project that would help the tooling effort.



For the sake of readers who, like me, don't know all the background
information, here's what I've unearthed over the last hour or two:

1. If you define CMAKE_EXPORT_COMPILE_COMMANDS cmake will create the file
   compile_commands.json.

   See http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=fe07b055
   and http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=5674844d

   I don't know if the format of this json file is documented anywhere, but
   from the above commits it seems to be an array of dicts like this:

      { "directory": "abc", "command": "g++ -xyz ...", "file": "source.cxx" }


2. Clang has a tool called scan-build that wraps an invocation of make.
   You call it like this:

      scan-build make

   Scan-build intercepts the compiler by setting CXX to some script that
   forwards on to the real compiler, and then (while it still knows all
   the compiler flags necessary to compile this file) it invokes the
   clang static analyzer.

   See http://clang-analyzer.llvm.org/scan-build.html
   and http://llvm.org/svn/llvm-project/cfe/trunk/tools/scan-build/scan-build

   It's 1400 lines of perl, but most of that seems to be command-line options,
   usage help, and generating html reports. The compiler-interception part
   doesn't seem too difficult.

   Scan-build is relevant to this discussion because one could generate a
   compilation database using a similar interposing technique.


3. Something completely different: Maybe we could figure out the compilation
   command-lines for all of a project's files at once by looking at the output
   of "make --always-make --dry-run".

   One difference from the lets-interpose-CXX approach is that this will give
   us some command-lines that are not C++ compilations, and we'd have to filter
   those out.

   Once we do know that it's a C++ compilation command-line, we still have to
   parse that command-line to figure out the name of the sourcefile (just like
   the interposed CXX script has to).


4. Doug's suggestion: Call clang with "-fcompilation-database=foo" during the
   course of a normal build. This will simultaneously compile the file and
   add/update an entry in the compilation database. (Or maybe only do the
   compilation database entry, requiring a separate invocation to do the
   actual compilation?)


Pros and cons of the various approaches:

Cmake +  The compilation database is generated at "cmake" time -- we don't need
         to do a full build.

Cmake +  Works on Windows.

Cmake -  (Obviously) doesn't work with non-cmake build systems.

CXX interposing +  Probably the easiest to implement if you have a project that
                   needs this *now* and you don't want to wait for a better
                   solution to make its way into clang.

CXX interposing +  Works with any build system as long as it is compliant with
                   the CXX / CC environment variable convention.

CXX interposing -  The interposed script has to parse the compilation command-
                   line to extract the source filename. This is duplication of
                   effort because clang already has to parse the command-line.

CXX interposing -  Each entry to the compilation database is added as the
                   corresponding target is being built, so in
                   parallel/distributed builds it will have to lock the
                   compilation database.

make --dry-run +  Works with any make-based system (I'm not very familiar with
                  non-GNU versions of make, but presumably they have similar
                  flags), except for recursive-make systems as mentioned below.

make --dry-run +  Far easier than re-implementing make.

make --dry-run +  No need to actually build the targets.

make --dry-run -  Like the CXX interposing technique, has to parse the
                  compilation command-line.

make --dry-run -  Gives you *all* the compilation commands, not just C or C++
                  compilations; you'll have to filter the output for what
                  you're interested in. Smells a bit hacky and brittle but
                  maybe that's just my prejudices speaking.

make --dry-run -  Doesn't work with some complex recursive-make build systems.
                  For example if part of your makefile creates another makefile
                  and then uses that, clearly your dry-run won't work unless it
                  actually does create that second makefile. In theory make has
                  ways to make this work -- see 
                  http://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html
                  -- but in practice I've never seen a large build system where
                  dry-run works.

clang -fcompilation-database +  Easier for the *user* than the two previous
                                shell-script-based solutions. No mucking about
                                with shell scripts: just set CXXFLAGS, run
                                make, and you're done.

clang -fcompilation-database +  Will work on Windows.

clang -fcompilation-database -  Like the CXX interposing technique, has to lock
                                the compilation database for parallel/
                                distributed builds.

clang -fcompilation-database -  Can't generate the compilation database without
                                building your whole project with clang.

That last point is more important (to me) than you might think. Say I have a
large codebase and not all of it builds with clang; but for the source files
that *can* be parsed by clang, I want to run some clang-based tool. Still,
having "-fcompilation-database" in clang doesn't stop me from writing my own
CXX-interposing scripts if I should need them.

Well, that's all. I hope someone finds it useful -- I can't be the only one to
have wondered how to actually get the full command-line through to clang-based
tools. :-) Once we decide on an official solution let's make sure we document
it well.

--Dave.





More information about the cfe-dev mailing list