[llvm-branch-commits] [Hashing] Use a non-deterministic seed (PR #96282)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Jun 20 23:47:06 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-adt
Author: Fangrui Song (MaskRay)
<details>
<summary>Changes</summary>
Hashing.h provides hash_value/hash_combine/hash_combine_range, which are
primarily used by
* `DenseMap<StringRef, X>`
* `FoldingSetNodeIDRef::ComputeHash` (will be fixed by #<!-- -->96136)
Users shouldn't rely on specific hash values due to potential algorithm
changes. `set_fixed_execution_hash_seed` is provided but it has never
been used.
Take the the address of a static storage duration variable as the seed
like absl/hash/internal/hash.h `kSeed`.
(See https://reviews.llvm.org/D93931 for workaround for older Clang.
Mach-O x86-64 forces PIC, so absl's `__apple_build_version__` check is
unnecessary.)
A few users relying on the iteration order of `DenseMap<StringRef, X>`
have been fixed (e.g., f8f4235612b9 c025bd1fdbbd 89e8e63f47ff
86eb6bf6715c eb8d03656549 0ea6b8e476c2 58d7a6e0e636 8ea31db27211
255986e27fcf).
>From my experience fixing `DenseMap<A *, X>` and
[`StringMap`](https://discourse.llvm.org/t/reverse-iteration-bots/72224)
iteration order issues, the scale of breakage is smaller.
---
Full diff: https://github.com/llvm/llvm-project/pull/96282.diff
5 Files Affected:
- (modified) llvm/include/llvm/ADT/Hashing.h (+11-34)
- (modified) llvm/lib/Support/CMakeLists.txt (-1)
- (removed) llvm/lib/Support/Hashing.cpp (-28)
- (modified) llvm/unittests/ADT/HashingTest.cpp (-72)
- (modified) llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn (-1)
``````````diff
diff --git a/llvm/include/llvm/ADT/Hashing.h b/llvm/include/llvm/ADT/Hashing.h
index a5477362a5079..3c87894635787 100644
--- a/llvm/include/llvm/ADT/Hashing.h
+++ b/llvm/include/llvm/ADT/Hashing.h
@@ -126,23 +126,6 @@ hash_code hash_value(const std::basic_string<T> &arg);
/// Compute a hash_code for a standard string.
template <typename T> hash_code hash_value(const std::optional<T> &arg);
-/// Override the execution seed with a fixed value.
-///
-/// This hashing library uses a per-execution seed designed to change on each
-/// run with high probability in order to ensure that the hash codes are not
-/// attackable and to ensure that output which is intended to be stable does
-/// not rely on the particulars of the hash codes produced.
-///
-/// That said, there are use cases where it is important to be able to
-/// reproduce *exactly* a specific behavior. To that end, we provide a function
-/// which will forcibly set the seed to a fixed value. This must be done at the
-/// start of the program, before any hashes are computed. Also, it cannot be
-/// undone. This makes it thread-hostile and very hard to use outside of
-/// immediately on start of a simple program designed for reproducible
-/// behavior.
-void set_fixed_execution_hash_seed(uint64_t fixed_value);
-
-
// All of the implementation details of actually computing the various hash
// code values are held within this namespace. These routines are included in
// the header file mainly to allow inlining and constant propagation.
@@ -322,24 +305,18 @@ struct hash_state {
}
};
-
-/// A global, fixed seed-override variable.
-///
-/// This variable can be set using the \see llvm::set_fixed_execution_seed
-/// function. See that function for details. Do not, under any circumstances,
-/// set or read this variable.
-extern uint64_t fixed_seed_override;
-
+/// The seed is non-deterministic (address of a variable) to prevent having
+/// users depend on the particular hash values. On platforms without ASLR, this
+/// is still likely non-deterministic per build.
inline uint64_t get_execution_seed() {
- // FIXME: This needs to be a per-execution seed. This is just a placeholder
- // implementation. Switching to a per-execution seed is likely to flush out
- // instability bugs and so will happen as its own commit.
- //
- // However, if there is a fixed seed override set the first time this is
- // called, return that instead of the per-execution seed.
- const uint64_t seed_prime = 0xff51afd7ed558ccdULL;
- static uint64_t seed = fixed_seed_override ? fixed_seed_override : seed_prime;
- return seed;
+ static const char seed = 0;
+ // Work around x86-64 negative offset folding for old Clang -fno-pic
+ // https://reviews.llvm.org/D93931
+#if !defined(__clang__) || __clang_major__ > 11
+ return static_cast<uint64_t>(reinterpret_cast<uintptr_t>(&seed));
+#else
+ return 0xff51afd7ed558ccdULL;
+#endif
}
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 0c69ac99f5bc6..31343df4d8b8b 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -187,7 +187,6 @@ add_llvm_component_library(LLVMSupport
FormatVariadic.cpp
GlobPattern.cpp
GraphWriter.cpp
- Hashing.cpp
HexagonAttributeParser.cpp
HexagonAttributes.cpp
InitLLVM.cpp
diff --git a/llvm/lib/Support/Hashing.cpp b/llvm/lib/Support/Hashing.cpp
deleted file mode 100644
index 1b20a670434f1..0000000000000
--- a/llvm/lib/Support/Hashing.cpp
+++ /dev/null
@@ -1,28 +0,0 @@
-//===-------------- lib/Support/Hashing.cpp -------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file provides implementation bits for the LLVM common hashing
-// infrastructure. Documentation and most of the other information is in the
-// header file.
-//
-//===----------------------------------------------------------------------===//
-
-#include "llvm/ADT/Hashing.h"
-
-using namespace llvm;
-
-// Provide a definition and static initializer for the fixed seed. This
-// initializer should always be zero to ensure its value can never appear to be
-// non-zero, even during dynamic initialization.
-uint64_t llvm::hashing::detail::fixed_seed_override = 0;
-
-// Implement the function for forced setting of the fixed seed.
-// FIXME: Use atomic operations here so that there is no data race.
-void llvm::set_fixed_execution_hash_seed(uint64_t fixed_value) {
- hashing::detail::fixed_seed_override = fixed_value;
-}
diff --git a/llvm/unittests/ADT/HashingTest.cpp b/llvm/unittests/ADT/HashingTest.cpp
index ab13ee833ce55..c28356e229e66 100644
--- a/llvm/unittests/ADT/HashingTest.cpp
+++ b/llvm/unittests/ADT/HashingTest.cpp
@@ -235,78 +235,6 @@ TEST(HashingTest, HashCombineRangeLengthDiff) {
}
}
-TEST(HashingTest, HashCombineRangeGoldenTest) {
- struct { const char *s; uint64_t hash; } golden_data[] = {
-#if SIZE_MAX == UINT64_MAX || SIZE_MAX == UINT32_MAX
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "abc", 0xe38e60bf19c71a3fULL },
- { "abcde", 0xd24461a66de97f6eULL },
- { "abcdefgh", 0x4ef872ec411dec9dULL },
- { "abcdefghijklm", 0xe8a865539f4eadfeULL },
- { "abcdefghijklmnopqrstu", 0x261cdf85faaf4e79ULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef", 0x43ba70e4198e3b2aULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef"
- "abcdefghijklmnopqrstuvwxyzghijkl"
- "abcdefghijklmnopqrstuvwxyzmnopqr"
- "abcdefghijklmnopqrstuvwxyzstuvwx"
- "abcdefghijklmnopqrstuvwxyzyzabcd", 0xdcd57fb2afdf72beULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "aa", 0xf2b3b69a9736a1ebULL },
- { "aaa", 0xf752eb6f07b1cafeULL },
- { "aaaaa", 0x812bd21e1236954cULL },
- { "aaaaaaaa", 0xff07a2cff08ac587ULL },
- { "aaaaaaaaaaaaa", 0x84ac949d54d704ecULL },
- { "aaaaaaaaaaaaaaaaaaaaa", 0xcb2c8fb6be8f5648ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xcc40ab7f164091b6ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xc58e174c1e78ffe9ULL },
- { "z", 0x1ba160d7e8f8785cULL },
- { "zz", 0x2c5c03172f1285d7ULL },
- { "zzz", 0x9d2c4f4b507a2ac3ULL },
- { "zzzzz", 0x0f03b9031735693aULL },
- { "zzzzzzzz", 0xe674147c8582c08eULL },
- { "zzzzzzzzzzzzz", 0x3162d9fa6938db83ULL },
- { "zzzzzzzzzzzzzzzzzzzzz", 0x37b9a549e013620cULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0x8921470aff885016ULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0xf60fdcd9beb08441ULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "aba", 0x3edb049950884d0aULL },
- { "ababa", 0x8f2de9e73a97714bULL },
- { "abababab", 0xee14a29ddf0ce54cULL },
- { "ababababababa", 0x38b3ddaada2d52b4ULL },
- { "ababababababababababa", 0xd3665364219f2b85ULL },
- { "abababababababababababababababab", 0xa75cd6afbf1bc972ULL },
- { "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab", 0x840192d129f7a22bULL }
-#else
-#error This test only supports 64-bit and 32-bit systems.
-#endif
- };
- for (unsigned i = 0; i < sizeof(golden_data)/sizeof(*golden_data); ++i) {
- StringRef str = golden_data[i].s;
- hash_code hash = hash_combine_range(str.begin(), str.end());
-#if 0 // Enable this to generate paste-able text for the above structure.
- std::string member_str = "\"" + str.str() + "\",";
- fprintf(stderr, " { %-35s 0x%016llxULL },\n",
- member_str.c_str(), static_cast<uint64_t>(hash));
-#endif
- EXPECT_EQ(static_cast<size_t>(golden_data[i].hash),
- static_cast<size_t>(hash));
- }
-}
-
TEST(HashingTest, HashCombineBasicTest) {
// Hashing a sequence of homogenous types matches range hashing.
const int i1 = 42, i2 = 43, i3 = 123, i4 = 999, i5 = 0, i6 = 79;
diff --git a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
index af4a6993cbc79..d6b6b71a1e59b 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
@@ -90,7 +90,6 @@ static_library("Support") {
"FormattedStream.cpp",
"GlobPattern.cpp",
"GraphWriter.cpp",
- "Hashing.cpp",
"HexagonAttributeParser.cpp",
"HexagonAttributes.cpp",
"InitLLVM.cpp",
``````````
</details>
https://github.com/llvm/llvm-project/pull/96282
More information about the llvm-branch-commits
mailing list