<div dir="ltr">On Fri, Jul 19, 2013 at 10:30 PM, Joshua Cranmer <span dir="ltr"><<a href="mailto:pidgeot18@gmail.com" target="_blank">pidgeot18@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 7/19/2013 11:44 PM, Sean Silva
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧
<span dir="ltr"><<a href="mailto:Pidgeot18@gmail.com" target="_blank">Pidgeot18@gmail.com</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>On 7/17/2013 11:36 PM, Sean Silva wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">tl;dr:
compiler embeds compilation db info in object files;
you can then collect it afterwards with simple tools.
Good idea?<br>
<br>
It seems like for a while now, we have been looking
for a way that clang can assist users in creating JSON
compilation databases, but solutions seem limited to
specific build systems or platforms. I came up with a
neat little hack that may be a viable way for clang to
help create compilation databases "everywhere clang
runs", with fairly good user experience.<br>
<br>
I believe the following user experience is achievable
"everywhere clang runs":<br>
1. Add some option to the compiler command line.<br>
2. Rebuild.<br>
3. Feed all of your built object files/executables to
a small tool we ship and out comes a compilation
database.<br>
</blockquote>
<br>
</div>
Quite frankly, I don't see this as being a superior
user-experience to the following scenario:<br>
1. Add an option to the compiler command line to dump the
database to a file.<br>
2. Rebuild.<br>
3. You have your compilation database automatically!<br>
<br>
The primary difficulty of this approach from an
implementation perspective is locking the database for
writing when compiling with make -jN.</blockquote>
<div><br>
</div>
That approach may simply not be achievable in the case where
the compilations are happening on physically distinct
machines that do not share an underlying filesystem. I
consider 'achievable "everywhere clang runs"' to be an
important property; it's easy to come up with many
far-superior user experiences by restricting consideration
to particular build environments.</div>
</div>
</div>
</blockquote>
<br></div>
Quite frankly, there is an inherent contradiction your use-case
there. If physically distinct machines are not sharing an underlying
filesystem, then the paths listed in the compilation database are
going to be incorrect in the first place, and the file requires
postprocessing anyways.<br></div></blockquote><div><br></div><div>I agree that this could be an issue, and never said that it wouldn't need postprocessing (this whole proposal presupposes some sort of (simple, I claim) postprocessing). However, this particular issue can be addressed with scripting (going through a JSON file and rebasing paths is a 50-line python script). I actually did this sort of path rebasing with a single sed command in the build I mentioned in the OP (although I was rebasing paths for a different reason). Certainly this is *much* easier than modifying a distributed build system (let alone the case where the distributed build system is proprietary an simply not changeable) to emit the paths (you would have to at least write the same rebasing logic, but do so within a possibly very large and complex piece of software that you don't maintain).</div>
<div><br></div><div>Note that the case of an unmodifiable distributed build system on physically distinct machines that do not share an underlying filesystem is a scenario that no proposal I have yet heard can handle *in any way, at all* (and I know of at least one such real-world scenario, in fact). I think it speaks very highly to the robustness and viability of embedding the information in existing build products that rebasing paths is the largest concern in this scenario.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
I'm looking at usability from the standpoint of typical open-source
projects (such as what you might find in your local Linux distro's
package list) using off-the-shelf build systems (which is, for the
most part, make). The reason that this feature request crops up from
time to time is the difficulty of massaging non-declarative
buildsystems like this into producing the compilation database;</div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
projects that have highly-specialized build systems probably have a
lower barrier to getting the build system itself to spit out the
compilation database, so they do not need this flag as much.<div class="im"><br></div></div></blockquote><div><br></div><div>On the contrary, in my experience "specialized" means "complex and difficult (maybe impossible) to change". For example consider one that uses make as a top-level build system that then calls into multiple specialized (possibly proprietary/out of your control) build systems for different parts of the project.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Also, I believe that should someone
want to integrate the compilation database into an automated
build process, the "update a random database file specified
on command line" approach has serious issues since the
nature of its dependency on other parts of the build process
is difficult to describe within the semantics of existing
build systems. For example, Ninja is fundamentally based on
"run this command with these files as input to get these
files as output", and it's not clear how to express "this
file is updated by running these commands, with such and
such invalidation/staleness semantics etc." within that
semantic model. Embedding the compilation database
information in build products integrates well with build
systems since the compilation database information is simply
another aspect of the compiler's output and follows the same
dependency rules.</div>
</div>
</div>
</blockquote>
<br></div>
Even if you embedded the database in the objects, the final step
("feed the output files to the extra tool") would still be partially
and subtly broken on incremental builds. Suppose I delete an output
executable (or, say, rename it) from the build commands. In most
build systems I've worked with, this won't delete the stale
executable on the disk. How would you propose to have your tool
distinguish between the stale unused executable and one that wasn't
touched because it wasn't updated?<br></div></blockquote><div><br></div><div>That paragraph was specifically discussing the case where the usage of this feature is being integrated into an automated build process (for clarification, this is a scenario where modifying the build program is off-limits, but adding new build rules is a possibility), rather than the "uncooperative build system" case; hence the build actually ensures the right files are fed to any tool. The issue of staleness in the "uncooperative" case is I think adequately handled by the following points:</div>
<div><br></div><div>1) I believe that this is a well-defined error case that we can accurately detect and for which we can provide an actionable diagnostic.</div><div>2) We could add an option to embed a unique ID with the compilation database entry, marking a particular build if this is a serious concern.</div>
<div>3) We could document this particular caveat.</div><div><br></div><div>Regardless, this is an issue that users would only run into after they have become "active users" of the feature, in which case I think they would be willing to make the above small concessions to having the tool work (transitioning out of the "uncooperative" case is not necessarily going to be feasible).</div>
<div><br></div><div>-- Sean Silva</div></div></div></div>