[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.

Thu Aug 12 18:49:55 PDT 2021

dexonsmith added inline comments.

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:101-102
+  enum TestEnumeration { TE_One = 1, TE_Two = 2 };
+  volatile int VI = 71;
+  const volatile int CVI = 72;
+  double D = 123.0;
----------------
It's not clear to me why you're including volatile ints here. Is there any reason to expect them to have different behaviour?

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:108-109
+  if (std::is_same<H, llvm::MD5>::value) {
+    static char const *const Hashes[2] = {"43a75756bd3479b839815a6c43a3e3f7",
+                                          "e8e522c9723d8e6a100364c8fc58fd1d"};
+    EXPECT_EQ(Hash, Hashes[E]);
----------------
I'm not such a fan of this reference hash / magic value idea. I'd prefer to check each of these data types separately, and rely on comparing against the result of the underlying Hasher (which we can assume is already well-tested elsewhere).

E.g.:
```
lang=c++
// helper for hashers
template <class H, class T> auto hashRawData(T Data) {
  H Hasher;
  Hasher.update(makeRawDataArrayRef(makeArrayRef(
      reinterpret_cast<uint8_t>(&Data), sizeof(Data)));
  return H.final();
};

// then in the test for raw integer data:

int I = 0x12345678;
EXPECT_EQ(HashBuilder<H>.update(I).final(), hashRawData<H>(I));
```
Could be each in their own test (one per type), or just in separate scopes, don't think it matters much.

Or if you think the magic numbers are better, can you walk me through your logic?

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:81-82
+  using MD5Words = std::pair<uint64_t, uint64_t>;
+  EXPECT_EQ(finalHash(Hasher),
+            MD5Words(0x7FE040B0B5311CE2ULL, 0xBE6766F347F32726ULL));
+}
----------------
dexonsmith wrote:
> It's not ideal to need to spell this out for each `HasherT`, and hardcoding the magic numbers doesn't seem quite right either. I suggest instead:
> - Unify the `HasherT` interfaces as necessary in prep commits to allow using them generically (maybe my other suggestions above are sufficient for that?).
> - Test each of these types in a standalone test in a unified `HashBuilderTest` below, dropping BasicTest/etc.
> - Add a helper function to the test fixture below to avoid boilerplate, something like:
> ```
> lang=c++
> template <class T> void checkHashableData(T Data) {
>   HasherT RawHasher;
>   auto Bytes = makeArrayRef(reinterpret_cast<uint8_t *>(&Data),...);
>   RawHasher.update(Bytes);
>   CHECK_EQ(HashBuilder<HasherT>().update(Data).final(),
>            RawHasher.final());
> }
> ```
I'm still seeing magic numbers, although this comment seems to be in the wrong place now. See further down in ReferenceHashCheck.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106910/new/

https://reviews.llvm.org/D106910