[PATCH] llvm-cov: Updated file checksum to be timestamp.

Fri Nov 15 23:03:39 PST 2013

On Fri, Nov 15, 2013 at 6:08 PM, Nick Lewycky <nlewycky at google.com> wrote:
> On 15 November 2013 17:38, Robinson, Paul
> <Paul_Robinson at playstation.sony.com> wrote:
>>
>> > -----Original Message-----
>> > From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> > bounces at cs.uiuc.edu] On Behalf Of Yuchen Wu
>> > Sent: Friday, November 15, 2013 4:59 PM
>> > To: Nick Lewycky; Bob Wilson
>> > Cc: llvm-commits at cs.uiuc.edu
>> > Subject: RE: [PATCH] llvm-cov: Updated file checksum to be timestamp.
>> >
>> > >> 2. Using the output file itself to seed hash function, which makes
>> > it
>> > >> deterministic. I've tried implementing this using the size of the
>> > >> output buffer and it was pretty simple. The problem with it, however,
>> > >> is that there's a lot more chance for a change to the GCNO file to go
>> > >> unnoticed. I also think that even if the source hadn't changed
>> > between
>> > >> compiles, the new binary files shouldn't be compatible with the old.
>> > >
>> > > This is obviously the correct approach. In general, it's important to
>> > > be able to have reproducible builds so that we can reproduce the same
>> > > binaries from source, builds where outputs can be cached (for instance
>> > > by modern non-make build systems that use the md5 of the output
>> > files),
>> > > etc. GCC's behaviour is silly and there's no need to replicate it.
>> > >
>> > >> "The problem with it, however, is that there's a lot more chance for
>> > a
>> > >> change to the GCNO file to go unnoticed."
>> > >
>> > > What do you mean by this? Are you worried that things could go into
>> > the
>> > > GCNO file without being an input to the hash function? The checksum is
>> > > a safety measure to help people avoid accidentally putting mismatching
>> > > GCNO and GCDA files together. Not having something be input to the
>> > hash
>> > > is the safe failure. We don't want the checksum to change if other
>> > > parts of the GCNO file weren't modified.
>> >
>> > What I meant by the last statement was that if you are doing something
>> > like hashing the size of the file to compute a checksum, there is a much
>> > higher chance that you may be using a GCNO file generated from a
>> > different source that just happens to be the same size. Obviously that
>> > was just an example, so if you guys came across a better way to seed the
>> > hash for Google's gcc checksum, I'd be happy to hear it :)
>>
>> Can we use an MD5 of the source file here? (Not having looked at the
>> patch, sorry...) The only reason I ask is that there's a DWARF 5 feature
>> to use MD5 instead of timestamps in the debug-line info, so computing an
>> MD5 of the source files is something we'll want to do anyway, eventually.
>
>
> That's reasonable, but I'd prefer to have the .gcno's checksum not depend on
> things which aren't in the .gcno. Fixing a typo in a comment for instance
> produces the same .o file and I'd like it to produce the same .gcno file.
>
> That applies to DWARF too. I hope we haven't standardized something that
> requires us to emit a different .o file just because of a typo fix in a
> comment.
>

It's not finalized yet :)

-eric