[cfe-dev] [RFC] Embedding compilation database info in object files.

Fri Jul 19 22:30:41 PDT 2013

On 7/19/2013 11:44 PM, Sean Silva wrote:
> On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧 
> <Pidgeot18 at gmail.com <mailto:Pidgeot18 at gmail.com>> wrote:
>
>     On 7/17/2013 11:36 PM, Sean Silva wrote:
>
>         tl;dr: compiler embeds compilation db info in object files;
>         you can then collect it afterwards with simple tools. Good idea?
>
>         It seems like for a while now, we have been looking for a way
>         that clang can assist users in creating JSON compilation
>         databases, but solutions seem limited to specific build
>         systems or platforms. I came up with a neat little hack that
>         may be a viable way for clang to help create compilation
>         databases "everywhere clang runs", with fairly good user
>         experience.
>
>         I believe the following user experience is achievable
>         "everywhere clang runs":
>         1. Add some option to the compiler command line.
>         2. Rebuild.
>         3. Feed all of your built object files/executables to a small
>         tool we ship and out comes a compilation database.
>
>
>     Quite frankly, I don't see this as being a superior
>     user-experience to the following scenario:
>     1. Add an option to the compiler command line to dump the database
>     to a file.
>     2. Rebuild.
>     3. You have your compilation database automatically!
>
>     The primary difficulty of this approach from an implementation
>     perspective is locking the database for writing when compiling
>     with make -jN.
>
>
> That approach may simply not be achievable in the case where the 
> compilations are happening on physically distinct machines that do not 
> share an underlying filesystem. I consider 'achievable "everywhere 
> clang runs"' to be an important property; it's easy to come up with 
> many far-superior user experiences by restricting consideration to 
> particular build environments.

Quite frankly, there is an inherent contradiction your use-case there. 
If physically distinct machines are not sharing an underlying 
filesystem, then the paths listed in the compilation database are going 
to be incorrect in the first place, and the file requires postprocessing 
anyways.

I'm looking at usability from the standpoint of typical open-source 
projects (such as what you might find in your local Linux distro's 
package list) using off-the-shelf build systems (which is, for the most 
part, make). The reason that this feature request crops up from time to 
time is the difficulty of massaging non-declarative buildsystems like 
this into producing the compilation database; projects that have 
highly-specialized build systems probably have a lower barrier to 
getting the build system itself to spit out the compilation database, so 
they do not need this flag as much.
>
> Also, I believe that should someone want to integrate the compilation 
> database into an automated build process, the "update a random 
> database file specified on command line" approach has serious issues 
> since the nature of its dependency on other parts of the build process 
> is difficult to describe within the semantics of existing build 
> systems. For example, Ninja is fundamentally based on "run this 
> command with these files as input to get these files as output", and 
> it's not clear how to express "this file is updated by running these 
> commands, with such and such invalidation/staleness semantics etc." 
> within that semantic model. Embedding the compilation database 
> information in build products integrates well with build systems since 
> the compilation database information is simply another aspect of the 
> compiler's output and follows the same dependency rules.

Even if you embedded the database in the objects, the final step ("feed 
the output files to the extra tool") would still be partially and subtly 
broken on incremental builds. Suppose I delete an output executable (or, 
say, rename it) from the build commands. In most build systems I've 
worked with, this won't delete the stale executable on the disk. How 
would you propose to have your tool distinguish between the stale unused 
executable and one that wasn't touched because it wasn't updated?

Without cooperation from the build system, incremental builds are going 
to be at risk having stale or incorrect information no matter what you 
do. I don't see how making the UX strictly worse for 95% of the 
potential users is worth what little benefits you're getting out of it.

-- 
Joshua Cranmer
News submodule owner
DXR coauthor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130720/283771d3/attachment.html>