[cfe-dev] [RFC] Embedding compilation database info in object files.

Thu Jul 18 23:39:04 PDT 2013

On Thu, Jul 18, 2013 at 11:20 PM, Sean Silva <silvas at purdue.edu> wrote:

>
>
>
> On Thu, Jul 18, 2013 at 5:08 AM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Thu, Jul 18, 2013 at 12:28 PM, Sean Silva <silvas at purdue.edu> wrote:
>>
>>> On Wed, Jul 17, 2013 at 9:44 PM, Manuel Klimek <klimek at google.com>wrote:
>>>>
>>>> We have done similar things before internally, but considered it more
>>>> to be a hack ;)
>>>>
>>>> I think the direction that we want to go to is to have an option in
>>>> clang to append to a compilation database while running - that way, no
>>>> post-processing step is required, which again needs to be somehow put into
>>>> the build flow. The only part missing is somebody with enough time on their
>>>> hands for whom this is high enough priority.
>>>>
>>>
>>> Wouldn't this approach (appending to a compilation database) have issues
>>> with filesystem contention and/or write atomicity in multicore/distributed
>>> builds (without involving a "real database" for the database storage)?
>>>
>>
>> On Unix systems we can handle that via file locks. On windows we'd need a
>> windows expert :P
>>
>>
>>> Also, wouldn't a post-processing step be needed in order to remove
>>> outdated entries appended from a previous incremental build (consider:
>>> `make; <rename some file in the project>; make`)?
>>>
>>
>> Well, we could require a rebuild to update the database (basically rm the
>> compilation database, make clean && rebuild).
>>
>>
>>> The approach I proposed has two extremely desirable properties that I
>>> think would be hard to achieve with an approach that carries the
>>> information in an external "side channel", as in the approach you suggested:
>>> 1. The compilation database info is always up to date as long as the
>>> build products are up to date, since the information follows the "causal
>>> chain" leading to the final programs/libraries.
>>>
>>
>> Wouldn't it have exactly the same "delete" problem? When I rename a .cc
>> file, won't most build systems leave the .o just lying around?
>>
>
> The use case I primarily envision is sourcing the compdb info in the usual
> case from "final" build products, like executables and libraries. In that
> case, the old .o would not be linked into the final build product and hence
> its compilation database info would not be included; there would be issues
> if one of the final build products is renamed though, but I think that is
> relatively rare, and we can document this particular caveat. In other cases
> (even when sourcing .o's), I think a useful, actionable diagnostic can be
> emitted ("compilation database entry found in file foo.o doesn't seem to
> correspond to any source file; skip it? delete it?").
>

Normally a project has multiple "final build products". The reason we have
the compilation database is that given a source file, you want to be able
to parse it. If I give you a source file, how do you know which of the
final build products you look into to get the information? All of them?
Have yet another database?

>
>
>>
>  2. It works "everywhere clang does" since it makes no assumptions about
>>> build systems, filesystems, or anything else; the data is carried along a
>>> datapath that already works (namely, that information emitted by the
>>> compiler will end up in build products).
>>>
>>
>> Putting it into a special section in the object file is definitely better
>> than what we did (just appending it to the object file, as no tool we know
>> fails with trailing bytes on a .o file).
>>
>> So I'm not completely opposed to the idea. I'd be curious what Chandler
>> thinks, he usually happens to have strong opinions about things like this :)
>>
>
> Yeah, I'd love to hear any ideas he has about this.
>
>
>>
>> Cheers,
>> /Manuel
>>
>>
>>> Also, the format of the embedded entry could be streamlined to make it
>>> utterly trivial to extract, e.g. a simple string
>>> "@ClangCompilationDatabaseEntryMD5JSON<hex md5sum of $JSON>$JSON", and then
>>> you could reliably extract the compdb entries with a single linear scan of
>>> arbitrary binary files; with that it seems like it would be feasible for
>>> most use cases (possibly adding an optional caching step) to have clang
>>> tools directly accept binaries containing such data as the compilation
>>> database itself!
>>>
>>> -- Sean Silva
>>>
>>
>>
> -- Sean Silva
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130719/e28ef844/attachment.html>