[PATCH] llvm-cov: Updated file checksum to be timestamp.

Eric Christopher echristo at gmail.com
Fri Nov 15 17:47:22 PST 2013


On Fri, Nov 15, 2013 at 5:38 PM, Robinson, Paul
<Paul_Robinson at playstation.sony.com> wrote:
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Yuchen Wu
>> Sent: Friday, November 15, 2013 4:59 PM
>> To: Nick Lewycky; Bob Wilson
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: RE: [PATCH] llvm-cov: Updated file checksum to be timestamp.
>>
>> >> 2. Using the output file itself to seed hash function, which makes
>> it
>> >> deterministic. I've tried implementing this using the size of the
>> >> output buffer and it was pretty simple. The problem with it, however,
>> >> is that there's a lot more chance for a change to the GCNO file to go
>> >> unnoticed. I also think that even if the source hadn't changed
>> between
>> >> compiles, the new binary files shouldn't be compatible with the old.
>> >
>> > This is obviously the correct approach. In general, it's important to
>> > be able to have reproducible builds so that we can reproduce the same
>> > binaries from source, builds where outputs can be cached (for instance
>> > by modern non-make build systems that use the md5 of the output
>> files),
>> > etc. GCC's behaviour is silly and there's no need to replicate it.
>> >
>> >> "The problem with it, however, is that there's a lot more chance for
>> a
>> >> change to the GCNO file to go unnoticed."
>> >
>> > What do you mean by this? Are you worried that things could go into
>> the
>> > GCNO file without being an input to the hash function? The checksum is
>> > a safety measure to help people avoid accidentally putting mismatching
>> > GCNO and GCDA files together. Not having something be input to the
>> hash
>> > is the safe failure. We don't want the checksum to change if other
>> > parts of the GCNO file weren't modified.
>>
>> What I meant by the last statement was that if you are doing something
>> like hashing the size of the file to compute a checksum, there is a much
>> higher chance that you may be using a GCNO file generated from a
>> different source that just happens to be the same size. Obviously that
>> was just an example, so if you guys came across a better way to seed the
>> hash for Google's gcc checksum, I'd be happy to hear it :)
>
> Can we use an MD5 of the source file here? (Not having looked at the
> patch, sorry...) The only reason I ask is that there's a DWARF 5 feature
> to use MD5 instead of timestamps in the debug-line info, so computing an
> MD5 of the source files is something we'll want to do anyway, eventually.
>

Slight editing here:

"... there's a DWARF 5 feature to use MD5 in the debug line info
instead of the timestamp of the file..."

that said, assuming you want the md5 of the preprocessed file?

-eric

>>
>> Anyway, I've taken your arguments to heart and here is a new patch that
>> uses a deterministic approach. Feedback is greatly welcome.
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits



More information about the llvm-commits mailing list