[cfe-dev] [RFC] Embedding compilation database info in object files.

Mon Jul 22 14:54:41 PDT 2013

On 7/22/2013 4:30 PM, Sean Silva wrote:
> On Fri, Jul 19, 2013 at 10:30 PM, Joshua Cranmer <pidgeot18 at gmail.com 
> <mailto:pidgeot18 at gmail.com>> wrote:
>
>     On 7/19/2013 11:44 PM, Sean Silva wrote:
>>     On Fri, Jul 19, 2013 at 6:36 PM, Joshua Cranmer 🐧
>>     <Pidgeot18 at gmail.com <mailto:Pidgeot18 at gmail.com>> wrote:
>>
>>         On 7/17/2013 11:36 PM, Sean Silva wrote:
>>
>>             tl;dr: compiler embeds compilation db info in object
>>             files; you can then collect it afterwards with simple
>>             tools. Good idea?
>>
>>             It seems like for a while now, we have been looking for a
>>             way that clang can assist users in creating JSON
>>             compilation databases, but solutions seem limited to
>>             specific build systems or platforms. I came up with a
>>             neat little hack that may be a viable way for clang to
>>             help create compilation databases "everywhere clang
>>             runs", with fairly good user experience.
>>
>>             I believe the following user experience is achievable
>>             "everywhere clang runs":
>>             1. Add some option to the compiler command line.
>>             2. Rebuild.
>>             3. Feed all of your built object files/executables to a
>>             small tool we ship and out comes a compilation database.
>>
>>
>>         Quite frankly, I don't see this as being a superior
>>         user-experience to the following scenario:
>>         1. Add an option to the compiler command line to dump the
>>         database to a file.
>>         2. Rebuild.
>>         3. You have your compilation database automatically!
>>
>>         The primary difficulty of this approach from an
>>         implementation perspective is locking the database for
>>         writing when compiling with make -jN.
>>
>>
>>     That approach may simply not be achievable in the case where the
>>     compilations are happening on physically distinct machines that
>>     do not share an underlying filesystem. I consider 'achievable
>>     "everywhere clang runs"' to be an important property; it's easy
>>     to come up with many far-superior user experiences by restricting
>>     consideration to particular build environments.
>
>     Quite frankly, there is an inherent contradiction your use-case
>     there. If physically distinct machines are not sharing an
>     underlying filesystem, then the paths listed in the compilation
>     database are going to be incorrect in the first place, and the
>     file requires postprocessing anyways.
>
>
> I agree that this could be an issue, and never said that it wouldn't 
> need postprocessing (this whole proposal presupposes some sort of 
> (simple, I claim) postprocessing). However, this particular issue can 
> be addressed with scripting (going through a JSON file and rebasing 
> paths is a 50-line python script). I actually did this sort of path 
> rebasing with a single sed command in the build I mentioned in the OP 
> (although I was rebasing paths for a different reason). Certainly this 
> is *much* easier than modifying a distributed build system (let alone 
> the case where the distributed build system is proprietary an simply 
> not changeable) to emit the paths (you would have to at least write 
> the same rebasing logic, but do so within a possibly very large and 
> complex piece of software that you don't maintain).
>
> Note that the case of an unmodifiable distributed build system on 
> physically distinct machines that do not share an underlying 
> filesystem is a scenario that no proposal I have yet heard can handle 
> *in any way, at all* (and I know of at least one such real-world 
> scenario, in fact). I think it speaks very highly to the robustness 
> and viability of embedding the information in existing build products 
> that rebasing paths is the largest concern in this scenario.

Rebasing paths isn't actually the largest concern I have, it's the most 
obvious one. See my reply for a sample of other ones.

One other concern that I don't think I explicitly mentioned is the fact 
that build systems can produce generated code (particularly headers are 
the most pernicious ones) that would need to be available to be able to 
compile the rest of the code. For example, configure generates a 
config.h that is included by every file. For a build system that only 
allows you access to the final product, there is no way you can make 
that compilation database work.

When you're dealing with distributed build systems that don't share 
filesystems, compilation databases are much more likely to break than 
they are to work, except for major tuning. By trying to explicitly 
handle those usecases, you are generating distinctly worse UX: things 
which are claimed to work aren't going to much of the time without a 
degree of finessing. To me, all of this feels like trying to bend over 
backwards to support a use case which is relatively rare while ignoring 
the fact that it's going to break other usecases, all to produce a UX 
which is distinctly worse for the vast majority of cases.

-- 
Joshua Cranmer
News submodule owner
DXR coauthor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130722/6eef902d/attachment.html>