[cfe-dev] (LibTooling) (scan-build-py) Link commands in the CompilationDatabase JSON

Mon Mar 13 14:14:45 PDT 2017

Dear Members,
Dear Manuel Klimek and László Nagy (rizsotto),

I'm resurrecting an older discussion
(http://clang-developers.42468.n3.nabble.com/compilation-db-question-td4054364.html)
and replying upon my request to include link commands in the
intercept-build's build.json
(https://github.com/rizsotto/scan-build/issues/80). Here at Ericsson
we develop the tools CodeChecker and CodeCompass. We use the
compilation commands JSON to get the information we need, but neither
CMake's generated database (related discussion:
http://clang-developers.42468.n3.nabble.com/Extending-CMAKE-EXPORT-COMPILE-COMMANDS-td4024793.html)
nor those produced by intercept-build contain linkage commands, which,
in certain cases, our tools need.
For this reason, we have been supplying our own interceptor, LD-LOGGER
(https://github.com/Ericsson/codechecker/tree/master/vendor/build-logger),
but it's messy and unmaintained as of now.

This is why the request to rizsotto's project has been posted, and he
pointed me in the direction of Clang, but I'd like to get some
pointers before I delve into changing the code.
I've tried some dummy build.jsons and scenarios with the current
(today's morning UTC) LibTooling projects such as clang-check and
clang-tidy.

Let's consider an example simple project, which is compiled (clang++
-c a.cpp) and then linked (clang++ a.o -o main.out). This will write
TWO entries into the build.json, one with the compile and one with the
linker command, and libtooling programs work with it perfectly, the
link command (valid as per the Compilation Command Database
specification) is not causing any mayhem.

Now consider the following build commands in the project.

    clang++ a.cpp b1.cpp -o ab1.o
    clang++ a.cpp b2.cpp -o ab2.o
    clang++ ab1.o c.cpp -o one.out
    clang++ ab2.o d.cpp -o two.out

If this is logged, either by our tool (with the linkage commands) or
via intercept-build, or -even- if I create a valid build.json for this
project in an editor, the tools clang-tidy and clang-check fail with
the error
     error: unable to handle compilation, expected exactly one compiler job

Which is understandable, because as of now, a.cpp exists twice in the
compile commands. Actually there are four lines, two with a.cpp as
file, and one-one with b1 and b2, but only two commands are
duplicated. Which is the expected result, seeing how the project is
built in our example. (This, to my understanding, fits the
specification of a CCDb.)

My questions are:

1. Is the "only one compiler job" an expectation only standing in
tools like clang-tidy and clang-check who want to "query" the proper
compilation commandline from the build.json and fail into ambiguity if
there are more, or is this a more general expectation?

2. Rizsotto said, and I quote

"But very little (or none) support for it in the current Clang tooling
library. (I would call the compilation database parser in Clang very
picky/strict.)
Currently I'm busy to merge this code into Clang repository. Would not
implement this feature now. [...] I can put more effort into it, when
there is a more generic driver from Clang side too. As far as I can
see Manuel (one of the guy behind Clang tooling) is supporting it, but
lack resource to implement it. (Be the change you want! ;))"

Assuming that I implement logging the build commands into
intercept-build (or Bear), which are the crucial Clang parts which I
should expect to be broken by the fact that linker commands are in the
database? Should there be a filter somewhere, in some project of
Clang, which filters the link commands on some criteria? (In our
tools, we implemented rules based on which we decided whether or not
an entry in ld-logger's output is a compile or a link command.)

As seen above, to my current understanding, having link commands does
not make LibTooling's head spin around --- but having the same file
referenced multiple time does, at least for some tools.

3. (This is more directed at Manuel)
Did the thought train move forward since November? What is the current
consensus on this approach? We would like to increase our tools'
support for what is generally used and more maintained in the
community.

Best regards,
Whisperity.