[PATCH] D45571: [ELF] - Speedup MergeInputSection::splitStrings

Fri Apr 27 14:13:06 PDT 2018

ruiu added inline comments.

================
Comment at: ELF/InputSection.cpp:872
+  // calculated as the same time as we read the string bytes for speedup.
+  uint32_t Hash = 5381;
+
----------------
I'd name this `H`.

================
Comment at: ELF/InputSection.cpp:875
+  // Load a word at a time and test if any of bytes is 0-byte.
+  while (DataSize > sizeof(uint32_t)) {
+    uint32_t Word = read32(Data);
----------------
sizeof(uint32_t) is always 4, so please write just 4.

================
Comment at: ELF/InputSection.cpp:880
+    // testing the single bytes to find the exact null byte position.
+    if (((Word - 0x01010101) & ~Word & 0x80808080) != 0)
+      break;
----------------
Omit `!= 0` because it is always implied.

================
Comment at: ELF/InputSection.cpp:883
+    Data += sizeof(uint32_t);
+    DataSize -= sizeof(uint32_t);
+    Hash = (Hash << 5) + Hash + Word;
----------------
I don't think you need to update DataSize. You know the end position of S, so you can compare Data with it to see if you've reached end of a string.

================
Comment at: ELF/InputSection.cpp:884
+    DataSize -= sizeof(uint32_t);
+    Hash = (Hash << 5) + Hash + Word;
+  }
----------------
`* 33` is perhaps better for brevity. 

================
Comment at: ELF/InputSection.cpp:889
+  // update the hash value too.
+  while (DataSize--) {
+    Hash = (Hash << 5) + Hash + *Data;
----------------
Likewise, maintaining both DataSize and Data doesn't seem to make sense to me.

================
Comment at: ELF/InputSection.cpp:896
+
+  llvm_unreachable("string is not null terminated");
+}
----------------
You should be able to handle this case gracefully.

================
Comment at: ELF/InputSection.cpp:901
 // null-terminated strings.
 void MergeInputSection::splitStrings(ArrayRef<uint8_t> Data, size_t EntSize) {
+  if (Data.size() < EntSize || Data.back() != 0)
----------------
Split this entire function into two; one for '\0'-terminated string and the other for multi-word string. That's better than doing it inside the while loop for readability.

================
Comment at: ELF/InputSection.cpp:914-916
+      uint64_t SizeHash = findSizeAndHash(S);
+      Size = SizeHash & 0xFFFFFFFF;
+      Hash = SizeHash >> 32;
----------------
This is very odd, and we should avoid returning two values in one word. Since `findSizeAndHash` is inlined, I don't think you need this.

https://reviews.llvm.org/D45571