[PATCH] D99279: [lld-macho] Parallelize UUID hash computation

Tue Mar 30 10:13:53 PDT 2021

oontvoo accepted this revision.
oontvoo added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: lld/MachO/Writer.cpp:929
+  ArrayRef<uint8_t> data{buffer->getBufferStart(), buffer->getBufferEnd()};
+  unsigned chunkCount = parallel::strategy.compute_thread_count() * 10;
+  // Round-up integer division
----------------
It seems `strategy` is only initted to hardware_concurrency (in Driver.cpp) if `--threads` is set.
Do we care to set set it unconditionally? 

================
Comment at: lld/MachO/Writer.cpp:921-925
+  std::vector<uint64_t> hashes(chunks.size());
+  parallelForEachN(0, chunks.size(),
+                   [&](size_t i) { hashes[i] = xxHash64(chunks[i]); });
+  uint64_t digest = xxHash64({reinterpret_cast<uint8_t *>(hashes.data()),
+                              hashes.size() * sizeof(uint64_t)});
----------------
int3 wrote:
> oontvoo wrote:
> > int3 wrote:
> > > oontvoo wrote:
> > > > Looking at the implementation of xxHash64, I think the `reinterpret_cast<uint8_t *>` is UB. 
> > > > Specifically, it tried to iterate the array by `+8`.
> > > > 
> > > > How about this?
> > > > Looking at the implementation of xxHash64, I think the reinterpret_cast<uint8_t *> is UB.
> > > > Specifically, it tried to iterate the array by +8.
> > > 
> > > Can you elaborate? I'm not sure I follow.
> > > 
> > > As for using `write64le` to write the intermediate values, I know LLD-ELF does it, but seems pretty superfluous to me. I guess it means that we'll get the same hash values if we run LLD on a big-endian host machine, but ... when will that ever happen? And if someone tried that, I'm pretty sure lots of things would break anyway, since we don't have any contbuilds covering this.
> > 
> > > Can you elaborate? I'm not sure I follow.
> > > As for using `write64le` <..... >
> > 
> > I guess `write64le`  is not strictly required :)
> > 
> >  I was getting more at the `reinterpret_cast`.  There is no *guarantee* that casting uint64_t* to a uint8_t* won't cause aliasing violations. In practice, it's probably fine. If possible, though, it'd be nice to avoid it. (And it *is* possible in this case)
> Hmm interesting. I did some reading up on aliasing rules, and if I understand things correctly, casting to a `char` type is fine.
> 
> From https://github.com/cplusplus/draft/raw/master/papers/n4659.pdf, section 6.10 [basic.lval] point 8:
> 
> > If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined
> > ...
> > a char, unsigned char, or std::byte type.
Not the hill to die on but I'll add that `uint8_t` isn't *guaranteed* to be `unsigned char`. Yes, in practice code in LLVM seems to make this kind of assumption.  so .... ¯\_(ツ)_/¯

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99279/new/

https://reviews.llvm.org/D99279