<div dir="ltr">On 15 November 2013 16:58, Yuchen Wu <span dir="ltr"><<a href="mailto:yuchenericwu@hotmail.com" target="_blank">yuchenericwu@hotmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>>> 2. Using the output file itself to seed hash function, which makes it <br>
>> deterministic. I've tried implementing this using the size of the<br>
>> output buffer and it was pretty simple. The problem with it, however,<br>
>> is that there's a lot more chance for a change to the GCNO file to go<br>
>> unnoticed. I also think that even if the source hadn't changed between<br>
>> compiles, the new binary files shouldn't be compatible with the old.<br>
><br>
> This is obviously the correct approach. In general, it's important to<br>
> be able to have reproducible builds so that we can reproduce the same<br>
> binaries from source, builds where outputs can be cached (for instance<br>
> by modern non-make build systems that use the md5 of the output files),<br>
> etc. GCC's behaviour is silly and there's no need to replicate it.<br>
><br>
>> "The problem with it, however, is that there's a lot more chance for a<br>
>> change to the GCNO file to go unnoticed."<br>
><br>
> What do you mean by this? Are you worried that things could go into the<br>
> GCNO file without being an input to the hash function? The checksum is<br>
> a safety measure to help people avoid accidentally putting mismatching<br>
> GCNO and GCDA files together. Not having something be input to the hash<br>
> is the safe failure. We don't want the checksum to change if other<br>
> parts of the GCNO file weren't modified.<br>
<br>
</div>What I meant by the last statement was that if you are doing something like hashing the size of the file to compute a checksum, there is a much higher chance that you may be using a GCNO file generated from a different source that just happens to be the same size. Obviously that was just an example, so if you guys came across a better way to seed the hash for Google's gcc checksum, I'd be happy to hear it :)<br>
</blockquote><div><br></div><div>Heh. I was just looking up the answer your question, and found this comment:</div><div><br></div><div><div> "The checksum is calculated carefully so that</div><div> source code changes that doesn't affect the control flow graph</div>
<div> won't change the checksum.</div><div> This is to make the profile data useable across source code change."</div><div><br></div></div><div>The CFG checksum appears to be a crc32 over the index numbers of the destination blocks for each basic block edge. I guess the blocks are numbered consistently run to run.</div>
<div><br></div><div>The line number checksum is a checksum over the *assembler* name of the function (C++ mangled or including __asm__ annotations), and the first line number of the function.</div><div><br></div><div>The function checksum appears to be a counter.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Anyway, I've taken your arguments to heart and here is a new patch that uses a deterministic approach. Feedback is greatly welcome. </blockquote>
</div><br></div><div class="gmail_extra">Thanks!</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">@@ -45,7 +45,9 @@ bool GCOVFile::read(GCOVBuffer &Buffer) {</div><div class="gmail_extra">
if (Format == GCOV::InvalidGCOV)</div><div class="gmail_extra"> return false;</div><div class="gmail_extra"> </div><div class="gmail_extra">+ static uint32_t Checksum;</div><div class="gmail_extra"><br></div><div class="gmail_extra">
This should be a member on GCOVFile instead? Imagine if we read a .gcda file first, or if we create ten GCOVFiles for ten .gcno files?</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">
-; RUN: head -c12 %T/version.gcno | grep '^oncg\*204MVLL$'</div><div class="gmail_extra">+; RUN: head -c12 %T/version.gcno | grep '^oncg\*204....$'</div><div class="gmail_extra"><br></div><div class="gmail_extra">
Use head -c8 and removing the trailing "...."?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Nick</div><div class="gmail_extra"><br></div></div></div></div>