[PATCH] llvm-cov: Updated file checksum to be timestamp.

Nick Lewycky nlewycky at google.com
Fri Nov 15 16:07:15 PST 2013


On 12 November 2013 16:08, Bob Wilson <bob.wilson at apple.com> wrote:

>
> On Nov 12, 2013, at 4:01 PM, Yuchen Wu <yuchenericwu at hotmail.com> wrote:
>
> > Hi all,
> >
> > So the reason I changed the file checksum to be a timestamp was because
> gcov was doing the same thing.
>
> Actually it is GCC that puts the timestamp into the gcno files (at least
> with the gcc-4.2 that Apple used in the past).
>
> > I understand that makes it non-deterministic but the alternatives aren't
> really that much better. The way I see it there are three options:
> >
> > 1. Using the timestamp as a seed to a hash function which will determine
> the checksum. This means that two checksums will never have the same value.
> This can be viewed as a bad thing since it's non-deterministic, but the
> advantage being there won't be any collision problems and makes it
> impossible for the user to use out of sync GCNO and GCDA files. An addendum
> to this could be that the user could opt to specify a seed instead of the
> timestamp if the output must be deterministic, which is what gcov does.
>
> I prefer this option.  I've never heard anyone complain about GCC's
> timestamps causing problems here.
>

On the contrary, the gcc google branch has replaced timestamps in gcov
files because of the problems is causes us. However, since our gcc team
didn't bother to push it to mainline gcc I really can't insist we do the
right thing to match gcc.

We run the compiler with --coverage and throw away the .gcno files --
they're in a cache from which they may expire. When viewing coverage data,
we may have .gcda data still in the cache but not the .gcno files in which
case we run the compiler to regenerate them. Naturally, if you use
timestamps this will cause a mismatch ... once you're unlucky enough. I
discovered this when I tried using timestamps in this field when adding
llvm's gcov support in the first place.

If you insist on timestamps, please add a flag so I can turn them off.

 > 2. Using the output file itself to seed hash function, which makes it
> deterministic. I've tried implementing this using the size of the output
> buffer and it was pretty simple. The problem with it, however, is that
> there's a lot more chance for a change to the GCNO file to go unnoticed. I
> also think that even if the source hadn't changed between compiles, the new
> binary files shouldn't be compatible with the old.
>

This is obviously the correct approach. In general, it's important to be
able to have reproducible builds so that we can reproduce the same binaries
from source, builds where outputs can be cached (for instance by modern
non-make build systems that use the md5 of the output files), etc. GCC's
behaviour is silly and there's no need to replicate it.

"The problem with it, however, is that there's a lot more chance for a
change to the GCNO file to go unnoticed."

What do you mean by this? Are you worried that things could go into the
GCNO file without being an input to the hash function? The checksum is a
safety measure to help people avoid accidentally putting mismatching GCNO
and GCDA files together. Not having something be input to the hash is the
safe failure. We don't want the checksum to change if other parts of the
GCNO file weren't modified.

Nick

 >
> > 3. Keep using "LLVM" as the checksum. The only benefit from this is that
> it's easy to tell the version was generated from clang.
> >
> > Given these reasons, I think the option 1 is best. What do you think?
> >
> > -Yuchen
> >
> > ----------------------------------------
> >> Date: Mon, 11 Nov 2013 14:48:54 -0600
> >> From: meadori at codesourcery.com
> >> To: silvas at purdue.edu; yuchenericwu at hotmail.com
> >> CC: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [PATCH] llvm-cov: Updated file checksum to be timestamp.
> >>
> >> On 11/11/2013 02:13 PM, Sean Silva wrote:
> >>
> >>> Yes. In general all tool output must be completely deterministic (a
> timestamp is
> >>> "nondeterministic" in the sense that it depends on the
> (nondeterministic) time
> >>> that the tool was built).
> >>
> >> There is always our sneaky friends __TIME__, __DATE__, and
> __TIMESTAMP__ :-)
> >> Incidentally there is a GCC patch just sent out last week [1] to issue
> a warning
> >> when using those macros:
> http://gcc.gnu.org/ml/gcc/2013-11/msg00066.html
> >>
> >> --
> >> Meador Inge
> >> CodeSourcery / Mentor Embedded
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131115/7a768ffa/attachment.html>


More information about the llvm-commits mailing list