[llvm-bugs] [Bug 35914] New: lld needs to set the link timeStamp on Windows builds, probably to a hash of the binary

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Jan 11 11:36:56 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=35914

            Bug ID: 35914
           Summary: lld needs to set the link timeStamp on Windows builds,
                    probably to a hash of the binary
           Product: lld
           Version: unspecified
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: release blocker
          Priority: P
         Component: COFF
          Assignee: unassignedbugs at nondot.org
          Reporter: brucedawson at chromium.org
                CC: llvm-bugs at lists.llvm.org

lld currently sets the link-time-stamp to zero, rather than to the link time,
in order to support reproducible builds (builds where the results depend only
on the inputs, not on the time or machine used to do the build).

This is a worthy goal but this solution to the reproducible build problem is
*not* practical. It will completely break symbol servers.

Symbol servers are a vital tool for Windows developers and they are used not
just to archive PDB files but also to archive PE files (.exe and .dll). The
format of the paths used is documented here:

https://randomascii.wordpress.com/2013/03/09/symbols-the-microsoft-way/

In particular note that the path for a PE file on a symbol server is generated
like this:

    ā€œ%s\%s\%s%s\%sā€ % (serverName, peName, timeStamp, imageSize, peName)

The peName and serverName can be considered to be unchanging which leaves the
timeStamp and imageSize to identify a particular binary. The imageSize is often
rounded to a page size so there are likely to be many similar builds which have
exactly the same page size.

So that leaves timeStamp as often the *only* differentiator between builds.
Therefore, setting the timeStamp field to a constant (zero or anything else) is
simply not tenable.

Starting with Windows 10 Microsoft has been creating reproducible builds which
is why the timeStamp field in Windows 10 binaries shows dates from seemingly
random years. The new implementation of this field is discussed here:

https://blogs.msdn.microsoft.com/oldnewthing/20180103-00/?p=97705

Roughly speaking the timeStamp field now contains some sort of hash of the
binary. This needs to be a reasonably good hash in order to reduce the odds of
collisions. A cryptographically secure hash would be overkill, especially given
that the result has to be packed down into a 32-bit value.

It should be possible to use the same idea to generate the GUID/age field for
identifying the PDB, in order to avoid this source of non-determinism in
builds.

Using a hash for the timeStamp and GUID/age would be compatible with the
buildID concept on Linux.



Any versions of lld that ignore this are incompatible with symbol servers and
therefore incompatible with "real" Windows development.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180111/390340ea/attachment.html>


More information about the llvm-bugs mailing list