<div dir="ltr">On 15 November 2013 17:38, Robinson, Paul <span dir="ltr"><<a href="mailto:Paul_Robinson@playstation.sony.com" target="_blank">Paul_Robinson@playstation.sony.com</a>></span> wrote:<br><div class="gmail_extra">


<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>> -----Original Message-----<br>

> From: <a href="mailto:llvm-commits-bounces@cs.uiuc.edu" target="_blank">llvm-commits-bounces@cs.uiuc.edu</a> [mailto:<a href="mailto:llvm-commits-" target="_blank">llvm-commits-</a><br>

> <a href="mailto:bounces@cs.uiuc.edu" target="_blank">bounces@cs.uiuc.edu</a>] On Behalf Of Yuchen Wu<br>

> Sent: Friday, November 15, 2013 4:59 PM<br>

> To: Nick Lewycky; Bob Wilson<br>

> Cc: <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

> Subject: RE: [PATCH] llvm-cov: Updated file checksum to be timestamp.<br>

><br>

> >> 2. Using the output file itself to seed hash function, which makes<br>

> it<br>

> >> deterministic. I've tried implementing this using the size of the<br>

> >> output buffer and it was pretty simple. The problem with it, however,<br>

> >> is that there's a lot more chance for a change to the GCNO file to go<br>

> >> unnoticed. I also think that even if the source hadn't changed<br>

> between<br>

> >> compiles, the new binary files shouldn't be compatible with the old.<br>

> ><br>

> > This is obviously the correct approach. In general, it's important to<br>

> > be able to have reproducible builds so that we can reproduce the same<br>

> > binaries from source, builds where outputs can be cached (for instance<br>

> > by modern non-make build systems that use the md5 of the output<br>

> files),<br>

> > etc. GCC's behaviour is silly and there's no need to replicate it.<br>

> ><br>

> >> "The problem with it, however, is that there's a lot more chance for<br>

> a<br>

> >> change to the GCNO file to go unnoticed."<br>

> ><br>

> > What do you mean by this? Are you worried that things could go into<br>

> the<br>

> > GCNO file without being an input to the hash function? The checksum is<br>

> > a safety measure to help people avoid accidentally putting mismatching<br>

> > GCNO and GCDA files together. Not having something be input to the<br>

> hash<br>

> > is the safe failure. We don't want the checksum to change if other<br>

> > parts of the GCNO file weren't modified.<br>

><br>

> What I meant by the last statement was that if you are doing something<br>

> like hashing the size of the file to compute a checksum, there is a much<br>

> higher chance that you may be using a GCNO file generated from a<br>

> different source that just happens to be the same size. Obviously that<br>

> was just an example, so if you guys came across a better way to seed the<br>

> hash for Google's gcc checksum, I'd be happy to hear it :)<br>

<br>

</div></div>Can we use an MD5 of the source file here? (Not having looked at the<br>

patch, sorry...) The only reason I ask is that there's a DWARF 5 feature<br>

to use MD5 instead of timestamps in the debug-line info, so computing an<br>

MD5 of the source files is something we'll want to do anyway, eventually.<br></blockquote><div><br></div><div>That's reasonable, but I'd prefer to have the .gcno's checksum not depend on things which aren't in the .gcno. Fixing a typo in a comment for instance produces the same .o file and I'd like it to produce the same .gcno file.</div>


<div><br></div><div>That applies to DWARF too. I hope we haven't standardized something that requires us to emit a different .o file just because of a typo fix in a comment.</div><div><br></div><div>Nick</div><div><br>


</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>><br>

> Anyway, I've taken your arguments to heart and here is a new patch that<br>

> uses a deterministic approach. Feedback is greatly welcome.<br>

><br>

<br>

</div></div></blockquote></div><br></div></div>