[PATCH] llvm-cov: Updated file checksum to be timestamp.

Nick Lewycky nicholas at mxc.ca
Mon Nov 18 02:18:41 PST 2013


Eric Christopher wrote:
>
> On Nov 16, 2013 8:37 PM, "David Blaikie" <dblaikie at gmail.com
> <mailto:dblaikie at gmail.com>> wrote:
>  >
>  >
>  >
>  >
>  > On Fri, Nov 15, 2013 at 11:03 PM, Eric Christopher
> <echristo at gmail.com <mailto:echristo at gmail.com>> wrote:
>  >>
>  >> On Fri, Nov 15, 2013 at 6:08 PM, Nick Lewycky <nlewycky at google.com
> <mailto:nlewycky at google.com>> wrote:
>  >> > On 15 November 2013 17:38, Robinson, Paul
>  >> > <Paul_Robinson at playstation.sony.com
> <mailto:Paul_Robinson at playstation.sony.com>> wrote:
>  >> >>
>  >> >> > -----Original Message-----
>  >> >> > From: llvm-commits-bounces at cs.uiuc.edu
> <mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-
> <mailto:llvm-commits->
>  >> >> > bounces at cs.uiuc.edu <mailto:bounces at cs.uiuc.edu>] On Behalf Of
> Yuchen Wu
>  >> >> > Sent: Friday, November 15, 2013 4:59 PM
>  >> >> > To: Nick Lewycky; Bob Wilson
>  >> >> > Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>  >> >> > Subject: RE: [PATCH] llvm-cov: Updated file checksum to be
> timestamp.
>  >> >> >
>  >> >> > >> 2. Using the output file itself to seed hash function, which
> makes
>  >> >> > it
>  >> >> > >> deterministic. I've tried implementing this using the size
> of the
>  >> >> > >> output buffer and it was pretty simple. The problem with it,
> however,
>  >> >> > >> is that there's a lot more chance for a change to the GCNO
> file to go
>  >> >> > >> unnoticed. I also think that even if the source hadn't changed
>  >> >> > between
>  >> >> > >> compiles, the new binary files shouldn't be compatible with
> the old.
>  >> >> > >
>  >> >> > > This is obviously the correct approach. In general, it's
> important to
>  >> >> > > be able to have reproducible builds so that we can reproduce
> the same
>  >> >> > > binaries from source, builds where outputs can be cached (for
> instance
>  >> >> > > by modern non-make build systems that use the md5 of the output
>  >> >> > files),
>  >> >> > > etc. GCC's behaviour is silly and there's no need to
> replicate it.
>  >> >> > >
>  >> >> > >> "The problem with it, however, is that there's a lot more
> chance for
>  >> >> > a
>  >> >> > >> change to the GCNO file to go unnoticed."
>  >> >> > >
>  >> >> > > What do you mean by this? Are you worried that things could
> go into
>  >> >> > the
>  >> >> > > GCNO file without being an input to the hash function? The
> checksum is
>  >> >> > > a safety measure to help people avoid accidentally putting
> mismatching
>  >> >> > > GCNO and GCDA files together. Not having something be input
> to the
>  >> >> > hash
>  >> >> > > is the safe failure. We don't want the checksum to change if
> other
>  >> >> > > parts of the GCNO file weren't modified.
>  >> >> >
>  >> >> > What I meant by the last statement was that if you are doing
> something
>  >> >> > like hashing the size of the file to compute a checksum, there
> is a much
>  >> >> > higher chance that you may be using a GCNO file generated from a
>  >> >> > different source that just happens to be the same size.
> Obviously that
>  >> >> > was just an example, so if you guys came across a better way to
> seed the
>  >> >> > hash for Google's gcc checksum, I'd be happy to hear it :)
>  >> >>
>  >> >> Can we use an MD5 of the source file here? (Not having looked at the
>  >> >> patch, sorry...) The only reason I ask is that there's a DWARF 5
> feature
>  >> >> to use MD5 instead of timestamps in the debug-line info, so
> computing an
>  >> >> MD5 of the source files is something we'll want to do anyway,
> eventually.
>  >> >
>  >> >
>  >> > That's reasonable, but I'd prefer to have the .gcno's checksum not
> depend on
>  >> > things which aren't in the .gcno. Fixing a typo in a comment for
> instance
>  >> > produces the same .o file and I'd like it to produce the same
> .gcno file.
>  >> >
>  >> > That applies to DWARF too. I hope we haven't standardized
> something that
>  >> > requires us to emit a different .o file just because of a typo fix
> in a
>  >> > comment.
>  >> >
>  >>
>  >> It's not finalized yet :)
>  >
>  >
>  > But the current spec (defining package hashing in terms of type
> hashing) doesn't include line numbers and a bunch of other stuff in the
> hash - so no, fixing a typo in a comment won't change the package hash.
>  >
>
> Yeah, this is a dwarf 5 proposal from Paul that he was bringing up.
>
> http://www.dwarfstd.org/ShowIssue.php?issue=130701.1
>
> Which looks like it is going to have this issue. Paul?

Thanks for the link Eric. The issue includes a rationale which is 
sufficient to continue discussion. Fundamentally they need to integrate 
with an immovable object that wants to know the MD5 of the source. That 
part's not negotiable.

The reason I want to minimize changes to the .o files is that I'm 
optimizing for build systems which look at the md5's of the .o's before 
performing a link, and skip the link if the inputs haven't changed. Make 
doesn't do this, but cmake, ninja and others do. (Also, the rebuilt 
binary might be 'tblgen' which in turn produces other source files. We 
want to cut this tree if at all possible.)

VS decided to put the md5 of each source file in the debug info, and 
Paul's proposal is to do the same with llvm-produced DWARF. Bad news, 
this would break minimal rebuilding build systems. Good news, fission. 
With fission we don't need to relink the .text objects if only the debug 
objects changed. Is that sufficient?

If not, we may need to include it just to make VS integration folks 
happy, but I will insist that it stay off for my builds.

Nick

>  > (I'm not sure it's right not to include line info, though - that
> seems sort of important to regenerate the debug info)
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list