[llvm-branch-commits] [Hashing] Use a non-deterministic seed (PR #96282)

via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Thu Jun 20 23:47:06 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-adt

Author: Fangrui Song (MaskRay)

<details>
<summary>Changes</summary>

Hashing.h provides hash_value/hash_combine/hash_combine_range, which are
primarily used by

* `DenseMap<StringRef, X>`
* `FoldingSetNodeIDRef::ComputeHash` (will be fixed by #<!-- -->96136)

Users shouldn't rely on specific hash values due to potential algorithm
changes. `set_fixed_execution_hash_seed` is provided but it has never
been used.

Take the the address of a static storage duration variable as the seed
like absl/hash/internal/hash.h `kSeed`.
(See https://reviews.llvm.org/D93931 for workaround for older Clang.
Mach-O x86-64 forces PIC, so absl's `__apple_build_version__` check is
unnecessary.)

A few users relying on the iteration order of `DenseMap<StringRef, X>`
have been fixed (e.g., f8f4235612b9 c025bd1fdbbd 89e8e63f47ff
86eb6bf6715c eb8d03656549 0ea6b8e476c2 58d7a6e0e636 8ea31db27211
255986e27fcf).
>From my experience fixing `DenseMap<A *, X>` and
[`StringMap`](https://discourse.llvm.org/t/reverse-iteration-bots/72224)
iteration order issues, the scale of breakage is smaller.


---
Full diff: https://github.com/llvm/llvm-project/pull/96282.diff


5 Files Affected:

- (modified) llvm/include/llvm/ADT/Hashing.h (+11-34) 
- (modified) llvm/lib/Support/CMakeLists.txt (-1) 
- (removed) llvm/lib/Support/Hashing.cpp (-28) 
- (modified) llvm/unittests/ADT/HashingTest.cpp (-72) 
- (modified) llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn (-1) 


``````````diff
diff --git a/llvm/include/llvm/ADT/Hashing.h b/llvm/include/llvm/ADT/Hashing.h
index a5477362a5079..3c87894635787 100644
--- a/llvm/include/llvm/ADT/Hashing.h
+++ b/llvm/include/llvm/ADT/Hashing.h
@@ -126,23 +126,6 @@ hash_code hash_value(const std::basic_string<T> &arg);
 /// Compute a hash_code for a standard string.
 template <typename T> hash_code hash_value(const std::optional<T> &arg);
 
-/// Override the execution seed with a fixed value.
-///
-/// This hashing library uses a per-execution seed designed to change on each
-/// run with high probability in order to ensure that the hash codes are not
-/// attackable and to ensure that output which is intended to be stable does
-/// not rely on the particulars of the hash codes produced.
-///
-/// That said, there are use cases where it is important to be able to
-/// reproduce *exactly* a specific behavior. To that end, we provide a function
-/// which will forcibly set the seed to a fixed value. This must be done at the
-/// start of the program, before any hashes are computed. Also, it cannot be
-/// undone. This makes it thread-hostile and very hard to use outside of
-/// immediately on start of a simple program designed for reproducible
-/// behavior.
-void set_fixed_execution_hash_seed(uint64_t fixed_value);
-
-
 // All of the implementation details of actually computing the various hash
 // code values are held within this namespace. These routines are included in
 // the header file mainly to allow inlining and constant propagation.
@@ -322,24 +305,18 @@ struct hash_state {
   }
 };
 
-
-/// A global, fixed seed-override variable.
-///
-/// This variable can be set using the \see llvm::set_fixed_execution_seed
-/// function. See that function for details. Do not, under any circumstances,
-/// set or read this variable.
-extern uint64_t fixed_seed_override;
-
+/// The seed is non-deterministic (address of a variable) to prevent having
+/// users depend on the particular hash values. On platforms without ASLR, this
+/// is still likely non-deterministic per build.
 inline uint64_t get_execution_seed() {
-  // FIXME: This needs to be a per-execution seed. This is just a placeholder
-  // implementation. Switching to a per-execution seed is likely to flush out
-  // instability bugs and so will happen as its own commit.
-  //
-  // However, if there is a fixed seed override set the first time this is
-  // called, return that instead of the per-execution seed.
-  const uint64_t seed_prime = 0xff51afd7ed558ccdULL;
-  static uint64_t seed = fixed_seed_override ? fixed_seed_override : seed_prime;
-  return seed;
+  static const char seed = 0;
+  // Work around x86-64 negative offset folding for old Clang -fno-pic
+  // https://reviews.llvm.org/D93931
+#if !defined(__clang__) || __clang_major__ > 11
+  return static_cast<uint64_t>(reinterpret_cast<uintptr_t>(&seed));
+#else
+  return 0xff51afd7ed558ccdULL;
+#endif
 }
 
 
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 0c69ac99f5bc6..31343df4d8b8b 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -187,7 +187,6 @@ add_llvm_component_library(LLVMSupport
   FormatVariadic.cpp
   GlobPattern.cpp
   GraphWriter.cpp
-  Hashing.cpp
   HexagonAttributeParser.cpp
   HexagonAttributes.cpp
   InitLLVM.cpp
diff --git a/llvm/lib/Support/Hashing.cpp b/llvm/lib/Support/Hashing.cpp
deleted file mode 100644
index 1b20a670434f1..0000000000000
--- a/llvm/lib/Support/Hashing.cpp
+++ /dev/null
@@ -1,28 +0,0 @@
-//===-------------- lib/Support/Hashing.cpp -------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file provides implementation bits for the LLVM common hashing
-// infrastructure. Documentation and most of the other information is in the
-// header file.
-//
-//===----------------------------------------------------------------------===//
-
-#include "llvm/ADT/Hashing.h"
-
-using namespace llvm;
-
-// Provide a definition and static initializer for the fixed seed. This
-// initializer should always be zero to ensure its value can never appear to be
-// non-zero, even during dynamic initialization.
-uint64_t llvm::hashing::detail::fixed_seed_override = 0;
-
-// Implement the function for forced setting of the fixed seed.
-// FIXME: Use atomic operations here so that there is no data race.
-void llvm::set_fixed_execution_hash_seed(uint64_t fixed_value) {
-  hashing::detail::fixed_seed_override = fixed_value;
-}
diff --git a/llvm/unittests/ADT/HashingTest.cpp b/llvm/unittests/ADT/HashingTest.cpp
index ab13ee833ce55..c28356e229e66 100644
--- a/llvm/unittests/ADT/HashingTest.cpp
+++ b/llvm/unittests/ADT/HashingTest.cpp
@@ -235,78 +235,6 @@ TEST(HashingTest, HashCombineRangeLengthDiff) {
   }
 }
 
-TEST(HashingTest, HashCombineRangeGoldenTest) {
-  struct { const char *s; uint64_t hash; } golden_data[] = {
-#if SIZE_MAX == UINT64_MAX || SIZE_MAX == UINT32_MAX
-    { "a",                                0xaeb6f9d5517c61f8ULL },
-    { "ab",                               0x7ab1edb96be496b4ULL },
-    { "abc",                              0xe38e60bf19c71a3fULL },
-    { "abcde",                            0xd24461a66de97f6eULL },
-    { "abcdefgh",                         0x4ef872ec411dec9dULL },
-    { "abcdefghijklm",                    0xe8a865539f4eadfeULL },
-    { "abcdefghijklmnopqrstu",            0x261cdf85faaf4e79ULL },
-    { "abcdefghijklmnopqrstuvwxyzabcdef", 0x43ba70e4198e3b2aULL },
-    { "abcdefghijklmnopqrstuvwxyzabcdef"
-      "abcdefghijklmnopqrstuvwxyzghijkl"
-      "abcdefghijklmnopqrstuvwxyzmnopqr"
-      "abcdefghijklmnopqrstuvwxyzstuvwx"
-      "abcdefghijklmnopqrstuvwxyzyzabcd", 0xdcd57fb2afdf72beULL },
-    { "a",                                0xaeb6f9d5517c61f8ULL },
-    { "aa",                               0xf2b3b69a9736a1ebULL },
-    { "aaa",                              0xf752eb6f07b1cafeULL },
-    { "aaaaa",                            0x812bd21e1236954cULL },
-    { "aaaaaaaa",                         0xff07a2cff08ac587ULL },
-    { "aaaaaaaaaaaaa",                    0x84ac949d54d704ecULL },
-    { "aaaaaaaaaaaaaaaaaaaaa",            0xcb2c8fb6be8f5648ULL },
-    { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xcc40ab7f164091b6ULL },
-    { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
-      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
-      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
-      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
-      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xc58e174c1e78ffe9ULL },
-    { "z",                                0x1ba160d7e8f8785cULL },
-    { "zz",                               0x2c5c03172f1285d7ULL },
-    { "zzz",                              0x9d2c4f4b507a2ac3ULL },
-    { "zzzzz",                            0x0f03b9031735693aULL },
-    { "zzzzzzzz",                         0xe674147c8582c08eULL },
-    { "zzzzzzzzzzzzz",                    0x3162d9fa6938db83ULL },
-    { "zzzzzzzzzzzzzzzzzzzzz",            0x37b9a549e013620cULL },
-    { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0x8921470aff885016ULL },
-    { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
-      "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
-      "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
-      "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
-      "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0xf60fdcd9beb08441ULL },
-    { "a",                                0xaeb6f9d5517c61f8ULL },
-    { "ab",                               0x7ab1edb96be496b4ULL },
-    { "aba",                              0x3edb049950884d0aULL },
-    { "ababa",                            0x8f2de9e73a97714bULL },
-    { "abababab",                         0xee14a29ddf0ce54cULL },
-    { "ababababababa",                    0x38b3ddaada2d52b4ULL },
-    { "ababababababababababa",            0xd3665364219f2b85ULL },
-    { "abababababababababababababababab", 0xa75cd6afbf1bc972ULL },
-    { "abababababababababababababababab"
-      "abababababababababababababababab"
-      "abababababababababababababababab"
-      "abababababababababababababababab"
-      "abababababababababababababababab", 0x840192d129f7a22bULL }
-#else
-#error This test only supports 64-bit and 32-bit systems.
-#endif
-  };
-  for (unsigned i = 0; i < sizeof(golden_data)/sizeof(*golden_data); ++i) {
-    StringRef str = golden_data[i].s;
-    hash_code hash = hash_combine_range(str.begin(), str.end());
-#if 0 // Enable this to generate paste-able text for the above structure.
-    std::string member_str = "\"" + str.str() + "\",";
-    fprintf(stderr, " { %-35s 0x%016llxULL },\n",
-            member_str.c_str(), static_cast<uint64_t>(hash));
-#endif
-    EXPECT_EQ(static_cast<size_t>(golden_data[i].hash),
-              static_cast<size_t>(hash));
-  }
-}
-
 TEST(HashingTest, HashCombineBasicTest) {
   // Hashing a sequence of homogenous types matches range hashing.
   const int i1 = 42, i2 = 43, i3 = 123, i4 = 999, i5 = 0, i6 = 79;
diff --git a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
index af4a6993cbc79..d6b6b71a1e59b 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
@@ -90,7 +90,6 @@ static_library("Support") {
     "FormattedStream.cpp",
     "GlobPattern.cpp",
     "GraphWriter.cpp",
-    "Hashing.cpp",
     "HexagonAttributeParser.cpp",
     "HexagonAttributes.cpp",
     "InitLLVM.cpp",

``````````

</details>


https://github.com/llvm/llvm-project/pull/96282


More information about the llvm-branch-commits mailing list