<div dir="ltr">On Fri, Jul 19, 2013 at 10:30 PM, Joshua Cranmer <span dir="ltr"><<a href="mailto:pidgeot18@gmail.com" target="_blank">pidgeot18@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div class="im">

    <div>On 7/19/2013 11:44 PM, Sean Silva

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧

        <span dir="ltr"><<a href="mailto:Pidgeot18@gmail.com" target="_blank">Pidgeot18@gmail.com</a>></span>

        wrote:<br>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div>On 7/17/2013 11:36 PM, Sean Silva wrote:<br>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">tl;dr:

                  compiler embeds compilation db info in object files;

                  you can then collect it afterwards with simple tools.

                  Good idea?<br>

                  <br>

                  It seems like for a while now, we have been looking

                  for a way that clang can assist users in creating JSON

                  compilation databases, but solutions seem limited to

                  specific build systems or platforms. I came up with a

                  neat little hack that may be a viable way for clang to

                  help create compilation databases "everywhere clang

                  runs", with fairly good user experience.<br>

                  <br>

                  I believe the following user experience is achievable

                  "everywhere clang runs":<br>

                  1. Add some option to the compiler command line.<br>

                  2. Rebuild.<br>

                  3. Feed all of your built object files/executables to

                  a small tool we ship and out comes a compilation

                  database.<br>

                </blockquote>

                <br>

              </div>

              Quite frankly, I don't see this as being a superior

              user-experience to the following scenario:<br>

              1. Add an option to the compiler command line to dump the

              database to a file.<br>

              2. Rebuild.<br>

              3. You have your compilation database automatically!<br>

              <br>

              The primary difficulty of this approach from an

              implementation perspective is locking the database for

              writing when compiling with make -jN.</blockquote>

            <div><br>

            </div>

            That approach may simply not be achievable in the case where

            the compilations are happening on physically distinct

            machines that do not share an underlying filesystem. I

            consider 'achievable "everywhere clang runs"' to be an

            important property; it's easy to come up with many

            far-superior user experiences by restricting consideration

            to particular build environments.</div>

        </div>

      </div>

    </blockquote>

    <br></div>

    Quite frankly, there is an inherent contradiction your use-case

    there. If physically distinct machines are not sharing an underlying

    filesystem, then the paths listed in the compilation database are

    going to be incorrect in the first place, and the file requires

    postprocessing anyways.<br></div></blockquote><div><br></div><div>I agree that this could be an issue, and never said that it wouldn't need postprocessing (this whole proposal presupposes some sort of (simple, I claim) postprocessing). However, this particular issue can be addressed with scripting (going through a JSON file and rebasing paths is a 50-line python script). I actually did this sort of path rebasing with a single sed command in the build I mentioned in the OP (although I was rebasing paths for a different reason). Certainly this is *much* easier than modifying a distributed build system (let alone the case where the distributed build system is proprietary an simply not changeable) to emit the paths (you would have to at least write the same rebasing logic, but do so within a possibly very large and complex piece of software that you don't maintain).</div>

<div><br></div><div>Note that the case of an unmodifiable distributed build system on physically distinct machines that do not share an underlying filesystem is a scenario that no proposal I have yet heard can handle *in any way, at all* (and I know of at least one such real-world scenario, in fact). I think it speaks very highly to the robustness and viability of embedding the information in existing build products that rebasing paths is the largest concern in this scenario.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    I'm looking at usability from the standpoint of typical open-source

    projects (such as what you might find in your local Linux distro's

    package list) using off-the-shelf build systems (which is, for the

    most part, make). The reason that this feature request crops up from

    time to time is the difficulty of massaging non-declarative

    buildsystems like this into producing the compilation database;</div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

projects that have highly-specialized build systems probably have a

    lower barrier to getting the build system itself to spit out the

    compilation database, so they do not need this flag as much.<div class="im"><br></div></div></blockquote><div><br></div><div>On the contrary, in my experience "specialized" means "complex and difficult (maybe impossible) to change". For example consider one that uses make as a top-level build system that then calls into multiple specialized (possibly proprietary/out of your control) build systems for different parts of the project.</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div class="im">

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote"><br>

          </div>

          <div class="gmail_quote">Also, I believe that should someone

            want to integrate the compilation database into an automated

            build process, the "update a random database file specified

            on command line" approach has serious issues since the

            nature of its dependency on other parts of the build process

            is difficult to describe within the semantics of existing

            build systems. For example, Ninja is fundamentally based on

            "run this command with these files as input to get these

            files as output", and it's not clear how to express "this

            file is updated by running these commands, with such and

            such invalidation/staleness semantics etc." within that

            semantic model. Embedding the compilation database

            information in build products integrates well with build

            systems since the compilation database information is simply

            another aspect of the compiler's output and follows the same

            dependency rules.</div>

        </div>

      </div>

    </blockquote>

    <br></div>

    Even if you embedded the database in the objects, the final step

    ("feed the output files to the extra tool") would still be partially

    and subtly broken on incremental builds. Suppose I delete an output

    executable (or, say, rename it) from the build commands. In most

    build systems I've worked with, this won't delete the stale

    executable on the disk. How would you propose to have your tool

    distinguish between the stale unused executable and one that wasn't

    touched because it wasn't updated?<br></div></blockquote><div><br></div><div>That paragraph was specifically discussing the case where the usage of this feature is being integrated into an automated build process (for clarification, this is a scenario where modifying the build program is off-limits, but adding new build rules is a possibility), rather than the "uncooperative build system" case; hence the build actually ensures the right files are fed to any tool. The issue of staleness in the "uncooperative" case is I think adequately handled by the following points:</div>

<div><br></div><div>1) I believe that this is a well-defined error case that we can accurately detect and for which we can provide an actionable diagnostic.</div><div>2) We could add an option to embed a unique ID with the compilation database entry, marking a particular build if this is a serious concern.</div>

<div>3) We could document this particular caveat.</div><div><br></div><div>Regardless, this is an issue that users would only run into after they have become "active users" of the feature, in which case I think they would be willing to make the above small concessions to having the tool work (transitioning out of the "uncooperative" case is not necessarily going to be feasible).</div>

<div><br></div><div>-- Sean Silva</div></div></div></div>