[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.
Alexandre Rames via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 16 11:30:27 PDT 2021
arames marked 2 inline comments as done.
arames added inline comments.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:86-94
+ /// Implement hashing for hashable data types, e.g. integral, enum, or
+ /// floating-point values.
+ ///
+ /// The value is simply treated like a bag of bytes.
+ template <class T>
+ std::enable_if_t<hash_builder::detail::IsHashableData<T>::value>
+ update(T Value) {
----------------
dexonsmith wrote:
> Please use `makeArrayRef()` to avoid splitting the pointer from the size.
>
> Also, I wonder if this should take into account endianness.
Updated to use `makeArrayRef()`.
Great point. I didn't pay attention to it, but taking it into account will significantly improve the value of this class.
Supporting endianness looks pretty straight forward thanks to `Support/Endian.h`. Please take a look.
A couple questions:
1. I kept the default as `native` instead of `little` to avoid penalizing big-endian platforms. On the other hand defaulting to a stable hash may be safer. What do you think ?
2. To add support for user types, we end up with
```
template <typename HasherT, support::endianness Endianness>
void updateHash(HashBuilder<HasherT, Endianness> &HBuilder, const UserType &Value)
```
I think that is ok. Do you prefer "reserving" the `updateHash` name and going with
```
template <typename HashBuilderT>
void updateHash(HashBuilderT &HBuilder, const UserType &Value)
```
?
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:205-207
+ template <typename RangeT> void updateRange(const RangeT &Range) {
+ updateRange(adl_begin(Range), adl_end(Range));
+ }
----------------
dexonsmith wrote:
> I think it's important to have `updateRange` overloads for `ArrayRef` (at least for `uint8_t`) and `StringRef` that directly call `updateBytes` rather than iterating.
Both `ArrayRef` and `StringRef` do end up calling the version of `updateRangeImpl` with `updateBytes`, due to how their `begin` and `end` are declared.
Do you mean to do it explicitly, in case the `ArrayRef` implementation changes ?
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:101-102
+ enum TestEnumeration { TE_One = 1, TE_Two = 2 };
+ volatile int VI = 71;
+ const volatile int CVI = 72;
+ double D = 123.0;
----------------
dexonsmith wrote:
> It's not clear to me why you're including volatile ints here. Is there any reason to expect them to have different behaviour?
I had borrowed the basic tests for `hash_code`, which did include it. I agree they bring no value here. Removed.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:26-36
+template <> auto finalHash<MD5>(MD5 &Hasher) {
+ MD5::MD5Result MD5Res;
+ Hasher.final(MD5Res);
+ return MD5Res.words();
+}
+
+template <> auto finalHash<SHA1>(SHA1 &Hasher) { return Hasher.final().str(); }
----------------
dexonsmith wrote:
> Can MD5 be updated in a prep patch to have an overload of `final` like this so this bridge code isn't needed?
https://reviews.llvm.org/D107781
This PR will have to be rebased once the prep is reviewed and lands.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:38-44
+template <typename HasherT, typename... Ts>
+static auto computeHash(const Ts &...Args) {
+ HasherT Hasher;
+ HashBuilder<HasherT> HBuilder(Hasher);
+ HBuilder.update(Args...);
+ return finalHash(HBuilder.Hasher);
+}
----------------
dexonsmith wrote:
> Seeing this boilerplate makes me wonder if the builder API could be improved with something like:
> - Add an `Optional<HasherT>` data member and a default constructor for `HashBuilder` that constructs it.
> - Make `update*` return `HashBuilder&`.
> - Add a `HashBuilder::final()` that returns `HasherT::final()` (after adding it to MD5 as above).
>
> Then the above logic becomes a one-liner:
> ```
> lang=c++
> HashBuilder<HasherT>().update(Args...).final()
> ```
> (maybe still worth encapsulating in a function here for testing, but maybe not; not sure)
Done. That makes it more usable indeed.
I kept the helper to keep the tests concise.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:81-82
+ using MD5Words = std::pair<uint64_t, uint64_t>;
+ EXPECT_EQ(finalHash(Hasher),
+ MD5Words(0x7FE040B0B5311CE2ULL, 0xBE6766F347F32726ULL));
+}
----------------
dexonsmith wrote:
> It's not ideal to need to spell this out for each `HasherT`, and hardcoding the magic numbers doesn't seem quite right either. I suggest instead:
> - Unify the `HasherT` interfaces as necessary in prep commits to allow using them generically (maybe my other suggestions above are sufficient for that?).
> - Test each of these types in a standalone test in a unified `HashBuilderTest` below, dropping BasicTest/etc.
> - Add a helper function to the test fixture below to avoid boilerplate, something like:
> ```
> lang=c++
> template <class T> void checkHashableData(T Data) {
> HasherT RawHasher;
> auto Bytes = makeArrayRef(reinterpret_cast<uint8_t *>(&Data),...);
> RawHasher.update(Bytes);
> CHECK_EQ(HashBuilder<HasherT>().update(Data).final(),
> RawHasher.final());
> }
> ```
I have cleaned this up significantly as part of the endianness support. It does not look like what you suggest, but I think it is much better than before.
There is now a single `ReferenceHashCheck` handling all hasher types, with the three big, little, and native endiannesses.
This is the only test that verifies against a hard-coded reference hash. In my opinion it is valuable to ensure the stability of hashes across platforms.
Other tests rely on equality or difference of different hashes.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106910/new/
https://reviews.llvm.org/D106910
More information about the llvm-commits
mailing list