[cfe-dev] [RFC] Embedding compilation database info in object files.

Sean Silva silvas at purdue.edu
Mon Jul 22 13:26:04 PDT 2013


On Sat, Jul 20, 2013 at 12:32 AM, Manuel Klimek <klimek at google.com> wrote:

> On Sat, Jul 20, 2013 at 6:44 AM, Sean Silva <silvas at purdue.edu> wrote:
>
>> On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧 <Pidgeot18 at gmail.com>wrote:
>>
>>> On 7/17/2013 11:36 PM, Sean Silva wrote:
>>>
>>>> tl;dr: compiler embeds compilation db info in object files; you can
>>>> then collect it afterwards with simple tools. Good idea?
>>>>
>>>> It seems like for a while now, we have been looking for a way that
>>>> clang can assist users in creating JSON compilation databases, but
>>>> solutions seem limited to specific build systems or platforms. I came up
>>>> with a neat little hack that may be a viable way for clang to help create
>>>> compilation databases "everywhere clang runs", with fairly good user
>>>> experience.
>>>>
>>>> I believe the following user experience is achievable "everywhere clang
>>>> runs":
>>>> 1. Add some option to the compiler command line.
>>>> 2. Rebuild.
>>>> 3. Feed all of your built object files/executables to a small tool we
>>>> ship and out comes a compilation database.
>>>>
>>>
>>> Quite frankly, I don't see this as being a superior user-experience to
>>> the following scenario:
>>> 1. Add an option to the compiler command line to dump the database to a
>>> file.
>>> 2. Rebuild.
>>> 3. You have your compilation database automatically!
>>>
>>> The primary difficulty of this approach from an implementation
>>> perspective is locking the database for writing when compiling with make
>>> -jN.
>>
>>
>> That approach may simply not be achievable in the case where the
>> compilations are happening on physically distinct machines that do not
>> share an underlying filesystem. I consider 'achievable "everywhere clang
>> runs"' to be an important property; it's easy to come up with many
>> far-superior user experiences by restricting consideration to particular
>> build environments.
>>
>
> Ah, that's where we disagree.
> I think this is for "simple" set ups. If you have the resources to write
> your own distributed build system, I assume you also have the resources to
> implement your own compilation database that works against the build graph
> a distributed build system has to somehow put together anyway.
>
>
I would really like the compilation database functionality to be considered
as a feature that is intrinsically part of clang rather than something that
"if you want to use this, cool, but you're going to have to do all this
custom work to get to hello world with it".

In dealing with game teams, each one may use a different (possibly
custom/private) build system/mashup of build systems, many of which are
closed source/proprietary (e.g. MSBuild.exe from Visual Studio). I'm trying
to come up with a solution that will work independently of the build
system, or at least with as few assumptions as possible (things like "they
have access to their final build products, since otherwise how would they
run them" and "they can modify the compiler flags"). Like I said in the OP,
I was able to rapidly extract a compilation database from a completely
unfamiliar (closed-source, proprietary) build system (that I still don't
understand!).

Also, especially for distributed build systems, you still have not answered
> the question:
> If I edit file a/b/c/d.cc, what algorithm do I run through to get the
> compile command line for that file?
>
>
See my response to David Blaikie:

Typically it is possible to establish some set of files/directories that
> are a superset of all the build products. Then the following
> "consistent/trivial/generic" algorithm is sufficient:
> for all files in the set of files/directories that is a superset of all
> build products:
>     check if it contains a compilation database entry
> This relies on an easily-identifiable mark that will only be present in
> the relevant build products, which I think is feasible to come up with; my
> idea "@ClangCompilationDatabaseEntryMD5JSON<hex md5sum of $JSON>$JSON" fits
> this role adequately I think.


Then you look through the aggregated compilation database entries for the
one that matches the file. This kind of strategy is a "last resort" for the
most uncooperative of build systems, but I think it is a use case that
needs to be accounted for.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130722/b5742d18/attachment.html>


More information about the cfe-dev mailing list