[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.

Alexandre Rames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 3 14:00:38 PDT 2021


arames added a comment.

In D106910#2914469 <https://reviews.llvm.org/D106910#2914469>, @dexonsmith wrote:

> Thanks, this looks like a great start to me. Lots of comments inline (many of them nitpicks).
>
>> Expose a raw pointer update interface for hashes.
>> The new method update(const uint8_t *Ptr, size_t Size) is an alternative to
>> the existing update(ArrayRef<uint8_t> Data). It will allow the incoming
>> HashBuilder interface to not depend on ArrayRef.
>
> //If// we add pointer+size APIs to the hashers, I think that should be in a separate prep patch...

Sure. That was already separated in a different commit.

> ... but IMO, ArrayRef is a safe/clean encapsulation of pointer+size for HashBuilder to use. Can you explain why you want to avoid it?

I agree `ArrayRef` is a safe and clean encapsulation. It seemed to me it was cleaner to not have the `HashBuilder` interface depend on any particular type, in a similar way to `hash_code`.
However your arguments about `ArrayRef` and `StringRef` being much more commonly used are on point, and si agree switching the dependency order and having `ArrayRef` and `StringRef` specialized in `HashBuilder.h` will be better.



================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:17
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
 #include <algorithm>
----------------
dexonsmith wrote:
> I think HashBuilder will probably be needed in fewer places than ArrayRef. It might be better to invert the includes (and move the new functions to HashBuilder.h).
Done. (See also global reply.)


================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:575-602
+  /// Support hashing via `HashBuilder` for hashable data types, e.g. integral,
+  /// enum, or floating-point values.
+  ///
+  /// `Value.size()` is taken into account to ensure cases like
+  /// ```
+  /// builder.update({1});
+  /// builder.update({2, 3});
----------------
dexonsmith wrote:
> I suggest moving these to member functions of `HashBuilder`.
Done. (See also global reply.)


================
Comment at: llvm/include/llvm/ADT/StringRef.h:15
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
 #include <algorithm>
----------------
dexonsmith wrote:
> Even more so than ArrayRef.h, we try really hard to avoid adding includes to StringRef.h since it's so widely included. I suggest inverting the include relationship.
Done. (See also global reply.)


================
Comment at: llvm/include/llvm/ADT/StringRef.h:966
+  template <typename HasherT>
+  void updateHash(HashBuilder<HasherT> &HBuilder, StringRef Value) {
+    HBuilder.update(Value.size());
----------------
dexonsmith wrote:
> (I suggest an overload in `HashBuilder` rather than a free function)
Done for `StringRef` and `ArrayRef`, since we now depend on both.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:139
+
+  template <typename T> void update(const std::basic_string<T> &Value) {
+    return updateRange(Value.begin(), Value.end());
----------------
dexonsmith wrote:
> StringRef would be more generally useful; I suggest that instead of std::string. (I don't think we usually bother handling other character types in LLVM... I suggest leaving it out, unless/until you have a specific use case that needs it?)
Nah; this was to have the generic form. Leaving it out.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:143-146
+  template <typename T1, typename T2> void update(std::pair<T1, T2> Value) {
+    update(Value.first);
+    update(Value.second);
+  }
----------------
dexonsmith wrote:
> Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a pair.
Absolutely. Fixed, with a test.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:147-152
+
+  template <typename... Ts>
+  typename std::enable_if<(sizeof...(Ts) > 1)>::type
+  update(std::tuple<Ts...> Arg) {
+    updateTupleHelper(Arg, typename std::index_sequence_for<Ts...>());
+  }
----------------
dexonsmith wrote:
> Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a tuple.
Absolutely. Fixed with a test.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:166
+  typename std::enable_if<(sizeof...(Ts) > 1)>::type update(const Ts &...Args) {
+    std::tuple<const Ts &...> Unused{(update(Args), Args)...};
+  }
----------------
dexonsmith wrote:
> Neat, I hadn't seen this pattern before. Can the variable name be skipped?
> ```
> (void)std::tuple<const Ts &...>{(update(Args), Args)...};
> ```
> If not, please add `(void)Unused;` to suppress diagnostics.
It can be skipped indeed.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:169-172
+  template <typename InputIteratorT>
+  void updateRange(InputIteratorT First, InputIteratorT Last) {
+    updateRangeImpl(First, Last);
+  }
----------------
dexonsmith wrote:
> I suggest adding a single-element overload that takes a generic range, calling `adl_begin() should also be a single-element overload that calls `llvm::adl_begin()` and `llvm::adl_end()` to extract iterators.
> 
> Also, this isn't going to work for any old input iterator. I suggest:
> - Name the parameter "ForwardIteratorT"
> - Add a test that confirms std::list iterators will work.
> - Add an enable_if to get a nice compile error for non-forward iterators.
> 
> Another option is to use random-access iterators and leave it for a future patch to handle forward iterators.
The update should fit what you want, but please take a second look.

Added an overload for a generic range using `adl_begin()` and `adl_end()`.
Cleaned-up `updateRange` and `updateRangeImpl`, with a test for `std::list`. The check for forward iterator does not use `enable_if`, but should be good enough.

Interestingly, with all this, implementations of `update` for `ArrayRef` and `StringRef` can simply forward to `updateRange`.


================
Comment at: llvm/include/llvm/Support/HashBuilder.h:176
+  /// take `Size` into account.
+  void updateBytes(const uint8_t *Ptr, size_t Size) {
+    Hasher.update(Ptr, Size);
----------------
dexonsmith wrote:
> I suggest taking / using an `ArrayRef<uint8_t>` here. It'd be a nice convenience to add an overload for `updateBytes(StringRef)` as well -- something I've wanted a few times when using the Hasher interfaces with `MemoryBuffer::getBuffer()` -- which can just cast over to ArrayRef<uint8_t>`.
Sure thing. I can imagine the three overloads to be helpful in different contexts.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106910/new/

https://reviews.llvm.org/D106910



More information about the llvm-commits mailing list