[PATCH] D147890: Fix sysvhash function

Nathan Sidwell via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 9 12:38:45 PDT 2023


urnathan created this revision.
Herald added a subscriber: pengfei.
Herald added a project: All.
urnathan requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

I had reason to look at hashSysV, the dynlib hash function.  It's a rather awkward formulation -- taken almost straight from the SysV docs.  But I noticed it is implemented incorrectly.  Specifically it treats the symbol name as a set of plain chars, rather than unsigned chars, as the documented SysV hash treats them.  For traditional names formed from a subset of 7-bit ASCII that's not a material difference.  But for host ABIs with plain char signed (like x86), this will fail for high-bit chars.  And we're in unicode UTF8 world now.[*]

So, use uint8_t, as the hashGnu function does.

Now, the loop is hard to optimize, and this reformulation removes 6 insns from the 16 insn loop on x86.  There are 3 changes (all of which are too hard for clang and gcc to spot right now)

1. the 'if (g)' test is meaningless.  We're xoring with g, so if it's zero, that's a nop anyway. As it happens, on x86 we compute the xor anyway and then cmov it.  So, just drop it.

2. the ending 'h ^= g' clears the top 4 bits.  But those 4 bits will get shifted out at the start of the next iteration.  We can remove sink this mask out of the loop.

3. the final change is to mask wth 0xf0 after shifting by 24 bits, rather than before by 0xf0<<24, as that increases the likelihood of the ISA being able to use a short immediate mask operand.  That's not the case for x86, but plenty of risc ISAs would need to materialize the larger constant somewhere.

I've added unit tests, and verified the hash values against binutil's implementation.

[X] Elf itself does not talk about character sets for symbol names -- they're merely NUL-terminated sequences of bytes.


https://reviews.llvm.org/D147890

Files:
  llvm/include/llvm/Object/ELF.h
  llvm/unittests/Object/ELFTest.cpp


Index: llvm/unittests/Object/ELFTest.cpp
===================================================================
--- llvm/unittests/Object/ELFTest.cpp
+++ llvm/unittests/Object/ELFTest.cpp
@@ -271,3 +271,14 @@
   EXPECT_THAT_ERROR(Region[3].takeError(), FailedWithMessage(ErrMsg2));
   EXPECT_THAT_ERROR(Region[4].takeError(), FailedWithMessage(ErrMsg2));
 }
+
+// Test the sysV and the gnu hash functions, particularly with utf8 unicode
+TEST(ELFTest, Hash) {
+  // Symbols long enough to have feedback
+  EXPECT_EQ(hashSysV("FooBarBazToto"), 0x5ec3e8fU);
+  EXPECT_EQ(hashGnu("FooBarBazToto"), 0x5478be61U);
+  EXPECT_EQ(hashSysV(u8"boom💥pants"), 0x5a0cf53U);
+  EXPECT_EQ(hashGnu(u8"boom💥pants"), 0xf5dda2deU);
+  EXPECT_EQ(hashSysV(u8"woot!🧙 💑 🌈"), 0x3522e38U);
+  EXPECT_EQ(hashGnu(u8"woot!🧙 💑 🌈"), 0xf7603f3U);
+}
Index: llvm/include/llvm/Object/ELF.h
===================================================================
--- llvm/include/llvm/Object/ELF.h
+++ llvm/include/llvm/Object/ELF.h
@@ -1237,15 +1237,12 @@
 /// Name of the API remains consistent as specified in the libelf
 /// REF : http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash
 inline unsigned hashSysV(StringRef SymbolName) {
-  unsigned h = 0, g;
-  for (char C : SymbolName) {
-    h = (h << 4) + C;
-    g = h & 0xf0000000L;
-    if (g != 0)
-      h ^= g >> 24;
-    h &= ~g;
+  uint32_t H = 0;
+  for (uint8_t C : SymbolName) {
+    H = (H << 4) + C;
+    H ^= (H >> 24) & 0xf0;
   }
-  return h;
+  return H & 0xfffffff;
 }
 
 /// This function returns the hash value for a symbol in the .dynsym section


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D147890.512031.patch
Type: text/x-patch
Size: 1616 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230409/c28097c7/attachment.bin>


More information about the llvm-commits mailing list