[PATCH] D92736: [lld/mac] Use xxhash instead of MD5 for computing the UUID

Greg Clayton via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 7 16:02:20 PST 2020


clayborg added a comment.

In D92736#2438432 <https://reviews.llvm.org/D92736#2438432>, @thakis wrote:

> Thanks! There's two mostly independent questions here.
>
> 1. Choice of hash function. The patch changes the hash function but doesn't modify what parts of the binary get hashed:
>
>> So if we change the UUID to be unique on each build, we can probably guarantee that Apple won't ever use lld for their builds. Not sure how important this is. This also means that if a UUID is generated by ld64 for a stripped or non-stripped binary, it will end up being the same, which is nice.
>
> That's not what this patch is doing. It changes only which hash function we use, not which ranges of the binary we hash.

Gotcha. So we are hashing only what ld64 is hashing?  Do the RFC 4122 changes make us differ from ld64 if we don't switch hashing algorithms?

> 2. Output binary determinism.
>
> I agree that this is an important goal. It is independent of this patch.
>
> You can get the benefit of build-dir-independence without hacking up the linker, see "Getting to local determinism" on https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html , in particular the `-fdebug-compilation-dir` bits. That's important to get `.o` files that are independent of the build dir as well, which in turn is important for distcc-like systems that can reuse `.o` caches across users / bots, independent of build dir.

Using this means debuggers need to have source remapping settings set correctly or we can't show sources correctly. So it does fix determinism but at the cost of debugging not working without user intervention. Most people will try and set a breakpoint, it won't work, then they turn the printf pro. But this kind of thing is needed for caches across users / bots like you mentioned.

> Is stripped builds and non-stripped builds having the same UUID something that's useful in practice? I would've expected that everyone builds a debug binary with symbols, and then strips it after building. What's the use case for links with and without debug info and wanting the same UUID?

We have cases where the ELF build ID changes on us for binaries when we have stripped debug info. Debuggers really want to match a stripped binary to a non stripped binary. The UUID in the file (LC_UUID for mach-o, ELF build ID for others) is what we try to use. When these don't match, we end up not associating the symbol file with the stripped executable and users don't get debug info in their debug session. We have scripts that know how to re-write the ELF build ID that we use at Facebook to make things match up for these cases, but this again requires the user to know why this is happening (most don't) and how to work around the issue (run a script to update the ELF build ID). Many debugger users turn to debugging with non-stripped binaries to get things working, which means the binaries that are cached and sent around and up being huge with all of the debug info.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92736/new/

https://reviews.llvm.org/D92736



More information about the llvm-commits mailing list