[PATCH] D45571: [ELF] - Speedup MergeInputSection::splitStrings

George Rimar via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 25 05:32:50 PDT 2018


grimar added inline comments.


================
Comment at: ELF/InputSection.cpp:886
+  while (DataSize > sizeof(uint32_t)) {
+    uint32_t Word = *reinterpret_cast<const uint32_t *>(Data);
+    // This checks if at least one byte of a word is a null byte.
----------------
espindola wrote:
> This will produce different results on a big endian host, no?
You are right..

To solve this,
I tried to rewrite hashing code right below to read byte by byte in a loop, but that damages the performance too much.

Then tried both `read32be()` and `read32le()` (my host is LE). Avg time for them was the same and has no difference from `*reinterpret_cast<const uint32_t *>`, so it seems we can use it.





================
Comment at: ELF/InputSection.cpp:894
+    DataSize -= sizeof(uint32_t);
+    Hash = (Hash << 5) + Hash + Word;
+  }
----------------
espindola wrote:
> I don't know enough about hashing to judge if this is a reasonable extension of the djb hash for using 4 bytes at a time, but we can probably start with it.
I experimented here, any bad hashing increases hash collisions, what instantly shows up in the profile. My approach seems works well, so I think it is OK. It is simpler than taking single bytes, and also the loop reading bytes worked slower for me.


https://reviews.llvm.org/D45571





More information about the llvm-commits mailing list