[cfe-dev] [RFC] Embedding compilation database info in object files.

Sean Silva silvas at purdue.edu
Mon Jul 22 14:30:49 PDT 2013


On Fri, Jul 19, 2013 at 10:30 PM, Joshua Cranmer <pidgeot18 at gmail.com>wrote:

>  On 7/19/2013 11:44 PM, Sean Silva wrote:
>
> On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧 <Pidgeot18 at gmail.com>wrote:
>
>> On 7/17/2013 11:36 PM, Sean Silva wrote:
>>
>>> tl;dr: compiler embeds compilation db info in object files; you can then
>>> collect it afterwards with simple tools. Good idea?
>>>
>>> It seems like for a while now, we have been looking for a way that clang
>>> can assist users in creating JSON compilation databases, but solutions seem
>>> limited to specific build systems or platforms. I came up with a neat
>>> little hack that may be a viable way for clang to help create compilation
>>> databases "everywhere clang runs", with fairly good user experience.
>>>
>>> I believe the following user experience is achievable "everywhere clang
>>> runs":
>>> 1. Add some option to the compiler command line.
>>> 2. Rebuild.
>>> 3. Feed all of your built object files/executables to a small tool we
>>> ship and out comes a compilation database.
>>>
>>
>>  Quite frankly, I don't see this as being a superior user-experience to
>> the following scenario:
>> 1. Add an option to the compiler command line to dump the database to a
>> file.
>> 2. Rebuild.
>> 3. You have your compilation database automatically!
>>
>> The primary difficulty of this approach from an implementation
>> perspective is locking the database for writing when compiling with make
>> -jN.
>
>
>  That approach may simply not be achievable in the case where the
> compilations are happening on physically distinct machines that do not
> share an underlying filesystem. I consider 'achievable "everywhere clang
> runs"' to be an important property; it's easy to come up with many
> far-superior user experiences by restricting consideration to particular
> build environments.
>
>
> Quite frankly, there is an inherent contradiction your use-case there. If
> physically distinct machines are not sharing an underlying filesystem, then
> the paths listed in the compilation database are going to be incorrect in
> the first place, and the file requires postprocessing anyways.
>

I agree that this could be an issue, and never said that it wouldn't need
postprocessing (this whole proposal presupposes some sort of (simple, I
claim) postprocessing). However, this particular issue can be addressed
with scripting (going through a JSON file and rebasing paths is a 50-line
python script). I actually did this sort of path rebasing with a single sed
command in the build I mentioned in the OP (although I was rebasing paths
for a different reason). Certainly this is *much* easier than modifying a
distributed build system (let alone the case where the distributed build
system is proprietary an simply not changeable) to emit the paths (you
would have to at least write the same rebasing logic, but do so within a
possibly very large and complex piece of software that you don't maintain).

Note that the case of an unmodifiable distributed build system on
physically distinct machines that do not share an underlying filesystem is
a scenario that no proposal I have yet heard can handle *in any way, at
all* (and I know of at least one such real-world scenario, in fact). I
think it speaks very highly to the robustness and viability of embedding
the information in existing build products that rebasing paths is the
largest concern in this scenario.


>
> I'm looking at usability from the standpoint of typical open-source
> projects (such as what you might find in your local Linux distro's package
> list) using off-the-shelf build systems (which is, for the most part,
> make). The reason that this feature request crops up from time to time is
> the difficulty of massaging non-declarative buildsystems like this into
> producing the compilation database;
>


> projects that have highly-specialized build systems probably have a lower
> barrier to getting the build system itself to spit out the compilation
> database, so they do not need this flag as much.
>
>
On the contrary, in my experience "specialized" means "complex and
difficult (maybe impossible) to change". For example consider one that uses
make as a top-level build system that then calls into multiple specialized
(possibly proprietary/out of your control) build systems for different
parts of the project.

>
>  Also, I believe that should someone want to integrate the compilation
> database into an automated build process, the "update a random database
> file specified on command line" approach has serious issues since the
> nature of its dependency on other parts of the build process is difficult
> to describe within the semantics of existing build systems. For example,
> Ninja is fundamentally based on "run this command with these files as input
> to get these files as output", and it's not clear how to express "this file
> is updated by running these commands, with such and such
> invalidation/staleness semantics etc." within that semantic model.
> Embedding the compilation database information in build products integrates
> well with build systems since the compilation database information is
> simply another aspect of the compiler's output and follows the same
> dependency rules.
>
>
> Even if you embedded the database in the objects, the final step ("feed
> the output files to the extra tool") would still be partially and subtly
> broken on incremental builds. Suppose I delete an output executable (or,
> say, rename it) from the build commands. In most build systems I've worked
> with, this won't delete the stale executable on the disk. How would you
> propose to have your tool distinguish between the stale unused executable
> and one that wasn't touched because it wasn't updated?
>

That paragraph was specifically discussing the case where the usage of this
feature is being integrated into an automated build process (for
clarification, this is a scenario where modifying the build program is
off-limits, but adding new build rules is a possibility), rather than the
"uncooperative build system" case; hence the build actually ensures the
right files are fed to any tool. The issue of staleness in the
"uncooperative" case is I think adequately handled by the following points:

1) I believe that this is a well-defined error case that we can accurately
detect and for which we can provide an actionable diagnostic.
2) We could add an option to embed a unique ID with the compilation
database entry, marking a particular build if this is a serious concern.
3) We could document this particular caveat.

Regardless, this is an issue that users would only run into after they have
become "active users" of the feature, in which case I think they would be
willing to make the above small concessions to having the tool work
(transitioning out of the "uncooperative" case is not necessarily going to
be feasible).

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130722/c09b64c6/attachment.html>


More information about the cfe-dev mailing list