[lld] [lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash (PR #121729)

Ellis Hoag via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 7 11:53:44 PST 2025


================
@@ -90,23 +93,24 @@ class BPSectionMacho : public BPSectionBase {
                             &sectionToIdx) const override {
     constexpr unsigned windowSize = 4;
 
-    // Calculate content hashes
-    size_t dataSize = isec->data.size();
-    for (size_t i = 0; i < dataSize; i++) {
-      auto window = isec->data.drop_front(i).take_front(windowSize);
-      hashes.push_back(xxHash64(window));
-    }
+    // Calculate content hashes: k-mers and the last k-1 bytes.
+    ArrayRef<uint8_t> data = isec->data;
+    if (data.size() >= windowSize)
+      for (size_t i = 0; i <= data.size() - windowSize; ++i)
+        hashes.push_back(llvm::support::endian::read32le(data.data() + i));
----------------
ellishg wrote:

The uncompressed size could change due to alignment and changes in the [unwind info section](https://discourse.llvm.org/t/some-questions-about-profile-guided-function-order-via-temporal-profiling-such-as-binary-size-regression/80513/6?u=ellishg). You can use `bloaty` to verify this. IIRC `[TEXT]` will show alignment changes, but that isn't well documented.


> Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

I settled on the current implementation by trying many different hashing strategies. I got the best results by hashing a sliding window for relocations and the section data. I'm open to changing this if we run experiments to confirm there is no regression. For now, I think those more aggressive changes should be a separate PR.

https://github.com/llvm/llvm-project/pull/121729


More information about the llvm-commits mailing list