[cfe-dev] [RFC] Embedding compilation database info in object files.

Joshua Cranmer pidgeot18 at gmail.com
Mon Jul 22 16:39:03 PDT 2013


On 7/22/2013 5:12 PM, Sean Silva wrote:
> On Mon, Jul 22, 2013 at 2:27 PM, Joshua Cranmer <pidgeot18 at gmail.com 
> <mailto:pidgeot18 at gmail.com>> wrote:
>
>     On 7/22/2013 3:26 PM, Sean Silva wrote:
>
>         In dealing with game teams, each one may use a different
>         (possibly custom/private) build system/mashup of build
>         systems, many of which are closed source/proprietary (e.g.
>         MSBuild.exe from Visual Studio). I'm trying to come up with a
>         solution that will work independently of the build system, or
>         at least with as few assumptions as possible (things like
>         "they have access to their final build products, since
>         otherwise how would they run them" and "they can modify the
>         compiler flags"). Like I said in the OP, I was able to rapidly
>         extract a compilation database from a completely unfamiliar
>         (closed-source, proprietary) build system (that I still don't
>         understand!).
>
>
>     The implicit assumptions for your approach amount to the following:
>     1. The user can make their build system use clang.
>     2. The user can make their build system add compiler flags to clang.
>     3. The user can find all of the final build products.
>     4. The build system does not mutilate binaries for the final build
>     products in a way that would render this unnecessary, or if this
>     is false, the build system retains an intermediate copy of the
>     products that has not yet been mutilated, and these intermediate
>     copies can be checked.
>     5. The binary targets are capable of having this information, and
>     capable of having this information extracted from this easily.
>     6. Adding this extra information would not cause the build system
>     to fail.
>     7. The user is willing to add all of this extra information to
>     their final build products, or to apply a post-processing step to
>     extract all of this extra information.
>     8. The set of all build steps may be found in the union of all
>     final build products.
>
>     Number 3 can be less trivial than it seems, particularly if you
>     don't think to add de-duplication steps.
>
>
> I think I have already adequately addressed this issue. See my 
> responses to Manuel and David Blaikie.
>
>     Number 4 is definitely not universally true (I've used some build
>     systems which mutilate the final product into a custom binary
>     format)--and may be generally false in the embedded world.
>
>
> I have already presented one scheme where the data is embedded as a 
> single, otherwise-unremarkable string literal. Consider the absurdity 
> of a scenario where the build system can fail to carry a string 
> literal though into the final build product (for all it knows, it is 
> being used as the argument to printf). The only case I can think of 
> that would make this difficult is one where the final build product is 
> being e.g. compressed/encrypted, in which case an 
> uncompressed/unencrypted version is highly likely to be around in a 
> well-defined location (do you know of any real setup where this is not 
> the case?).

It depends on what you mean and what you consider "valid" as a build 
system. In one scenario, what I'd have access to is this: 
<http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-x86/1374500809/>. 
The main binaries are compressed via some format that file is not able 
to tell me.
>
>     Number 5 I think may be iffy, and I can think of situations to
>     make number 6 not true.
>
> Again, I'm not aware of any build system that will not respect a 
> string literal that the compiler embeds as being necessary. The number 
> of scenarios where the scheme will work is therefore a superset of the 
> cases where printf is available, for example.

I think there are tools that strip unused symbols from binaries, and I 
suspect these would be fairly widely used in embedding toolchains. My 
limited experience with such toolchains leads me to the conclusion that 
mucking with binaries in any fashion isn't going to reliably solve the 
build-chain.

>
> I know of at least one real use case where 3 and 4 are not met (I can 
> elaborate somewhat, but it is internal so the description will have be 
> made in appropriately broad strokes). If you want to deliberately 
> exclude that use case from consideration, please state that 
> explicitly. It may be that these different ideas cover different 
> subsets of the possible build configurations (although I have yet to 
> be presented with a real scenario where embedding the info in build 
> products will simply not work, but the "write to a file on the side" 
> one will).

In my experience, it is relatively easy to get even a hostile build 
system you have little control over to give you the contents of an extra 
file at the end (this has included such steps as "cat a tarball to 
stdout" and using a tool to extract the data from uploaded log files). 
Indeed, if you *can* build locally, then log-to-file *will* work. On the 
other hand, emitting stuff into binaries assumes that build systems 
won't munge binaries or will at least leave unmunged binaries lying 
around (I have very little faith in build systems), that this data can 
be reliably extracted from binaries, all in the hopes that it will work 
on some limited cases where the build system is completely unable to 
build locally.


-- 
Joshua Cranmer
News submodule owner
DXR coauthor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130722/12819df4/attachment.html>


More information about the cfe-dev mailing list