[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.

Thu Jul 29 13:05:18 PDT 2021

dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.

Thanks, this looks like a great start to me. Lots of comments inline (many of them nitpicks).

> Expose a raw pointer update interface for hashes.
> The new method update(const uint8_t *Ptr, size_t Size) is an alternative to
> the existing update(ArrayRef<uint8_t> Data). It will allow the incoming
> HashBuilder interface to not depend on ArrayRef.

//If// we add pointer+size APIs to the hashers, I think that should be in a separate prep patch...

... but IMO, ArrayRef is a safe/clean encapsulation of pointer+size for HashBuilder to use. Can you explain why you want to avoid it?

================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:17
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
 #include <algorithm>
----------------
I think HashBuilder will probably be needed in fewer places than ArrayRef. It might be better to invert the includes (and move the new functions to HashBuilder.h).

================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:575-602
+  /// Support hashing via `HashBuilder` for hashable data types, e.g. integral,
+  /// enum, or floating-point values.
+  ///
+  /// `Value.size()` is taken into account to ensure cases like
+  /// ```
+  /// builder.update({1});
+  /// builder.update({2, 3});
----------------
I suggest moving these to member functions of `HashBuilder`.

================
Comment at: llvm/include/llvm/ADT/StringRef.h:15
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
 #include <algorithm>
----------------
Even more so than ArrayRef.h, we try really hard to avoid adding includes to StringRef.h since it's so widely included. I suggest inverting the include relationship.

================
Comment at: llvm/include/llvm/ADT/StringRef.h:966
+  template <typename HasherT>
+  void updateHash(HashBuilder<HasherT> &HBuilder, StringRef Value) {
+    HBuilder.update(Value.size());
----------------
(I suggest an overload in `HashBuilder` rather than a free function)

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:21
+
+#include <string>
+#include <utility>
----------------
You should be able to remove this once you're including StringRef.h

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:139
+
+  template <typename T> void update(const std::basic_string<T> &Value) {
+    return updateRange(Value.begin(), Value.end());
----------------
StringRef would be more generally useful; I suggest that instead of std::string. (I don't think we usually bother handling other character types in LLVM... I suggest leaving it out, unless/until you have a specific use case that needs it?)

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:143-146
+  template <typename T1, typename T2> void update(std::pair<T1, T2> Value) {
+    update(Value.first);
+    update(Value.second);
+  }
----------------
Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a pair.

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:147-152
+
+  template <typename... Ts>
+  typename std::enable_if<(sizeof...(Ts) > 1)>::type
+  update(std::tuple<Ts...> Arg) {
+    updateTupleHelper(Arg, typename std::index_sequence_for<Ts...>());
+  }
----------------
Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a tuple.

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:166
+  typename std::enable_if<(sizeof...(Ts) > 1)>::type update(const Ts &...Args) {
+    std::tuple<const Ts &...> Unused{(update(Args), Args)...};
+  }
----------------
Neat, I hadn't seen this pattern before. Can the variable name be skipped?
```
(void)std::tuple<const Ts &...>{(update(Args), Args)...};
```
If not, please add `(void)Unused;` to suppress diagnostics.

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:169-172
+  template <typename InputIteratorT>
+  void updateRange(InputIteratorT First, InputIteratorT Last) {
+    updateRangeImpl(First, Last);
+  }
----------------
I suggest adding a single-element overload that takes a generic range, calling `adl_begin() should also be a single-element overload that calls `llvm::adl_begin()` and `llvm::adl_end()` to extract iterators.

Also, this isn't going to work for any old input iterator. I suggest:
- Name the parameter "ForwardIteratorT"
- Add a test that confirms std::list iterators will work.
- Add an enable_if to get a nice compile error for non-forward iterators.

Another option is to use random-access iterators and leave it for a future patch to handle forward iterators.

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:176
+  /// take `Size` into account.
+  void updateBytes(const uint8_t *Ptr, size_t Size) {
+    Hasher.update(Ptr, Size);
----------------
I suggest taking / using an `ArrayRef<uint8_t>` here. It'd be a nice convenience to add an overload for `updateBytes(StringRef)` as well -- something I've wanted a few times when using the Hasher interfaces with `MemoryBuffer::getBuffer()` -- which can just cast over to ArrayRef<uint8_t>`.

================
Comment at: llvm/include/llvm/Support/HashBuilder.h:198
+  void updateRangeImpl(InputIteratorT First, InputIteratorT Last) {
+    update(Last - First);
+    for (auto It = First; It != Last; ++It)
----------------
I think this will have to be a `std::distance` call if you handle arbitrary forward iterators (not sure that's necessary). The template parameters could use an update in either case.

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:51
+TEST(HashBuilderTest, BasicTestMD5) {
+  char c = 'c';
+  int i = 1;
----------------
Lots of "invalid case style" warnings coming from clang-tidy in this file. Please follow the style conventions at https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly (e.g., use `C` for this variable instead of `c`), if nothing else to make the patch easier to read.

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:75
+
+TEST(HashBuilderTest, BasicTestSHA1) {
+  char c = 'c';
----------------
Rather than duplicating the tests, I suggest using http://google.github.io/googletest/advanced.html#typed-tests to iterate through MD5, SHA1, and SHA256.

================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:225-228
+TEST(HashBuilderTest, HashStdPairTuple) {
+  EXPECT_EQ(computeMD5Words(std::make_pair(1, "string")),
+            computeMD5Words(std::make_tuple(1, "string")));
+}
----------------
I suggest comparing these both (or at least one of them) against:
```
lang=c++
update(1);
update("string");
```
so that you're confirming the hash is updated correctly, not just the same way as each other (unless you tested that elsewhere and I missed it?)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106910/new/

https://reviews.llvm.org/D106910