[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.
Duncan P. N. Exon Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 3 15:08:19 PDT 2021
dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.
Updates look great; a few more comments inline.
Since the ultimate goal is to make it easy for higher-level logic to use a stable hash across different architectures, I realize that endianness could be important too for the hashable-data overloads. Have you considered what to do there?
One idea: add an endianness template parameter to HashBuilder:
template <class HasherT, support::endianness Endianness = support::native>
class HashBuilder;
then change the `update` overloads for hashable data to hash the result of `support::endian::byte_swap(V, Endianness)`. In the common case (`native`) the copy/etc should get optimized out. WDYT?
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:86-94
+ /// Implement hashing for hashable data types, e.g. integral, enum, or
+ /// floating-point values.
+ ///
+ /// The value is simply treated like a bag of bytes.
+ template <class T>
+ std::enable_if_t<hash_builder::detail::IsHashableData<T>::value>
+ update(T Value) {
----------------
Please use `makeArrayRef()` to avoid splitting the pointer from the size.
Also, I wonder if this should take into account endianness.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:195
+ typename std::enable_if<(sizeof...(Ts) > 1)>::type update(const Ts &...Args) {
+ std::tuple<const Ts &...> Unused{(update(Args), Args)...};
+ }
----------------
I think you can drop the name here too.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:205-207
+ template <typename RangeT> void updateRange(const RangeT &Range) {
+ updateRange(adl_begin(Range), adl_end(Range));
+ }
----------------
I think it's important to have `updateRange` overloads for `ArrayRef` (at least for `uint8_t`) and `StringRef` that directly call `updateBytes` rather than iterating.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:220-224
+ /// Update the hash with a "bag of bytes". This does not rely on `HasherT` to
+ /// take `Size` into account.
+ void updateBytes(const uint8_t *Ptr, size_t Size) {
+ updateBytes(ArrayRef<uint8_t>(Ptr, Size));
+ }
----------------
I don't think we should add this overload. The rare call site can call `makeArrayRef` itself, encouraging it to push ArrayRef further up the stack.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:26-36
+template <> auto finalHash<MD5>(MD5 &Hasher) {
+ MD5::MD5Result MD5Res;
+ Hasher.final(MD5Res);
+ return MD5Res.words();
+}
+
+template <> auto finalHash<SHA1>(SHA1 &Hasher) { return Hasher.final().str(); }
----------------
Can MD5 be updated in a prep patch to have an overload of `final` like this so this bridge code isn't needed?
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:38-44
+template <typename HasherT, typename... Ts>
+static auto computeHash(const Ts &...Args) {
+ HasherT Hasher;
+ HashBuilder<HasherT> HBuilder(Hasher);
+ HBuilder.update(Args...);
+ return finalHash(HBuilder.Hasher);
+}
----------------
Seeing this boilerplate makes me wonder if the builder API could be improved with something like:
- Add an `Optional<HasherT>` data member and a default constructor for `HashBuilder` that constructs it.
- Make `update*` return `HashBuilder&`.
- Add a `HashBuilder::final()` that returns `HasherT::final()` (after adding it to MD5 as above).
Then the above logic becomes a one-liner:
```
lang=c++
HashBuilder<HasherT>().update(Args...).final()
```
(maybe still worth encapsulating in a function here for testing, but maybe not; not sure)
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:81-82
+ using MD5Words = std::pair<uint64_t, uint64_t>;
+ EXPECT_EQ(finalHash(Hasher),
+ MD5Words(0x7FE040B0B5311CE2ULL, 0xBE6766F347F32726ULL));
+}
----------------
It's not ideal to need to spell this out for each `HasherT`, and hardcoding the magic numbers doesn't seem quite right either. I suggest instead:
- Unify the `HasherT` interfaces as necessary in prep commits to allow using them generically (maybe my other suggestions above are sufficient for that?).
- Test each of these types in a standalone test in a unified `HashBuilderTest` below, dropping BasicTest/etc.
- Add a helper function to the test fixture below to avoid boilerplate, something like:
```
lang=c++
template <class T> void checkHashableData(T Data) {
HasherT RawHasher;
auto Bytes = makeArrayRef(reinterpret_cast<uint8_t *>(&Data),...);
RawHasher.update(Bytes);
CHECK_EQ(HashBuilder<HasherT>().update(Data).final(),
RawHasher.final());
}
```
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:102
+template <typename HasherT>
+class TypedHashBuilderTest : public testing::Test {};
+using HasherTypes = ::testing::Types<MD5, SHA1, SHA256>;
----------------
I suggest dropping the `Typed` and calling this simply `HashBuilderTest`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106910/new/
https://reviews.llvm.org/D106910
More information about the llvm-commits
mailing list