[cfe-dev] [RFC] Embedding compilation database info in object files.

Fri Jul 19 16:05:31 PDT 2013

On Fri, Jul 19, 2013 at 3:58 PM, David Blaikie <dblaikie at gmail.com> wrote:

> On Fri, Jul 19, 2013 at 3:50 PM, Sean Silva <silvas at purdue.edu> wrote:
> >
> >
> >
> > On Fri, Jul 19, 2013 at 12:33 AM, Chandler Carruth <chandlerc at google.com
> >
> > wrote:
> >>
> >>
> >> On Thu, Jul 18, 2013 at 2:20 PM, Sean Silva <silvas at purdue.edu> wrote:
> >>>>
> >>>> So I'm not completely opposed to the idea. I'd be curious what
> Chandler
> >>>> thinks, he usually happens to have strong opinions about things like
> this :)
> >>>
> >>>
> >>> Yeah, I'd love to hear any ideas he has about this.
> >>
> >>
> >> I'm summoned. =D
> >>
> >> So, I'm moderately opposed to the idea. The reason is that we've tried
> >> this (as Manuel mentions) and it creates a really huge new problem:
> where do
> >> you look for the object file. Worse, the *right* object file.
> >>
> >> The primary benefit of writing out to a single compilation database is
> >> *precisely* that: it is a *single* compilation database. You can place
> it in
> >> a common, predictable location and have clang-based tools look there.
> We had
> >> huge, never-ending problems with this in practice. We would spend more
> time
> >> looking for the .o file than we would running the clang tool, or we
> would
> >> find the wrong .o file and end up not reproducing the compile developers
> >> actually cared about.
> >
> >
> > Perhaps the commandline flag that enables emitting this information could
> > also accept a unique ID to embed along with the information, which would
> > allow you to easily decide which files correspond to which build. Would
> that
> > address these issues you cite? If not, could you elaborate?
> >
> >>
> >>
> >> Even if you build aggregate databases as you say for "final build
> >> products", I agree with Manuel: this just moves the problem. Now you
> need to
> >> know which build product to look into.
> >
> >
> > I think that "Moving the problem" is actually an important part of this
> > proposal. The primary issue in my eyes is getting information from A to
> B,
> > where
> >
> > A = a place that knows what information should be in the compilation
> > database
> > B = somewhere that a tool can access the compilation database.
> >
> > I can think of two places that qualify for A: 1) The build system and 2)
> the
> > the compiler itself. Changing the build system (either the build program
> > itself, or even the "project files") is a non-starter in my environment.
> So
> > that leaves us with using A=the compiler itself (I'm open to other
> values of
> > A, but at the moment these two are the only ones I can think of). Now,
> the
> > question is how do you transport the information to somewhere the tool
> can
> > access it.
> >
> > Basically by definition, a build system allows access to build products,
> and
> > so without any additional assumption, build products are a viable option
> for
> > transporting the information in all build systems. The constraint of not
> > modifying existing build configurations leads to the conclusion that
> these
> > build products that we use for transport must be *existing* build
> products.
> >
> > Simply embedding this information in build products doesn't get us all
> the
> > way from A to B. However, "Moving the problem" to be "given a set of
> build
> > products known to contain all the compilation database information for a
> > project, extract a compilation database" is I think a useful
> simplification
> > of the problem "given an arbitrary C/C++ project, extract a compilation
> > database". Would you agree?
>
> But that's not where you've moved the problem, because you still have
> the problem of "find the set of build products". If a build system
> produces 3 binaries - there's no consistent/trivial/generic way to
> discover those 3 binaries that I know of.
>
>
Typically it is possible to establish some set of files/directories that
are a superset of all the build products. Then the following
"consistent/trivial/generic" algorithm is sufficient:

for all files in the set of files/directories that is a superset of all
build products:
    check if it contains a compilation database entry

This relies on an easily-identifiable mark that will only be present in the
relevant build products, which I think is feasible to come up with; my idea
"@ClangCompilationDatabaseEntryMD5JSON<hex md5sum of $JSON>$JSON" fits this
role adequately I think.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130719/023fa105/attachment.html>