[PATCH] D106910: [Support]: Introduce the `HashBuilder` interface.
Duncan P. N. Exon Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 29 13:05:18 PDT 2021
dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.
Thanks, this looks like a great start to me. Lots of comments inline (many of them nitpicks).
> Expose a raw pointer update interface for hashes.
> The new method update(const uint8_t *Ptr, size_t Size) is an alternative to
> the existing update(ArrayRef<uint8_t> Data). It will allow the incoming
> HashBuilder interface to not depend on ArrayRef.
//If// we add pointer+size APIs to the hashers, I think that should be in a separate prep patch...
... but IMO, ArrayRef is a safe/clean encapsulation of pointer+size for HashBuilder to use. Can you explain why you want to avoid it?
================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:17
#include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
#include <algorithm>
----------------
I think HashBuilder will probably be needed in fewer places than ArrayRef. It might be better to invert the includes (and move the new functions to HashBuilder.h).
================
Comment at: llvm/include/llvm/ADT/ArrayRef.h:575-602
+ /// Support hashing via `HashBuilder` for hashable data types, e.g. integral,
+ /// enum, or floating-point values.
+ ///
+ /// `Value.size()` is taken into account to ensure cases like
+ /// ```
+ /// builder.update({1});
+ /// builder.update({2, 3});
----------------
I suggest moving these to member functions of `HashBuilder`.
================
Comment at: llvm/include/llvm/ADT/StringRef.h:15
#include "llvm/Support/Compiler.h"
+#include "llvm/Support/HashBuilder.h"
#include <algorithm>
----------------
Even more so than ArrayRef.h, we try really hard to avoid adding includes to StringRef.h since it's so widely included. I suggest inverting the include relationship.
================
Comment at: llvm/include/llvm/ADT/StringRef.h:966
+ template <typename HasherT>
+ void updateHash(HashBuilder<HasherT> &HBuilder, StringRef Value) {
+ HBuilder.update(Value.size());
----------------
(I suggest an overload in `HashBuilder` rather than a free function)
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:21
+
+#include <string>
+#include <utility>
----------------
You should be able to remove this once you're including StringRef.h
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:139
+
+ template <typename T> void update(const std::basic_string<T> &Value) {
+ return updateRange(Value.begin(), Value.end());
----------------
StringRef would be more generally useful; I suggest that instead of std::string. (I don't think we usually bother handling other character types in LLVM... I suggest leaving it out, unless/until you have a specific use case that needs it?)
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:143-146
+ template <typename T1, typename T2> void update(std::pair<T1, T2> Value) {
+ update(Value.first);
+ update(Value.second);
+ }
----------------
Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a pair.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:147-152
+
+ template <typename... Ts>
+ typename std::enable_if<(sizeof...(Ts) > 1)>::type
+ update(std::tuple<Ts...> Arg) {
+ updateTupleHelper(Arg, typename std::index_sequence_for<Ts...>());
+ }
----------------
Should this be by-reference? Might be worth adding a test with a no-move no-copy type in a tuple.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:166
+ typename std::enable_if<(sizeof...(Ts) > 1)>::type update(const Ts &...Args) {
+ std::tuple<const Ts &...> Unused{(update(Args), Args)...};
+ }
----------------
Neat, I hadn't seen this pattern before. Can the variable name be skipped?
```
(void)std::tuple<const Ts &...>{(update(Args), Args)...};
```
If not, please add `(void)Unused;` to suppress diagnostics.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:169-172
+ template <typename InputIteratorT>
+ void updateRange(InputIteratorT First, InputIteratorT Last) {
+ updateRangeImpl(First, Last);
+ }
----------------
I suggest adding a single-element overload that takes a generic range, calling `adl_begin() should also be a single-element overload that calls `llvm::adl_begin()` and `llvm::adl_end()` to extract iterators.
Also, this isn't going to work for any old input iterator. I suggest:
- Name the parameter "ForwardIteratorT"
- Add a test that confirms std::list iterators will work.
- Add an enable_if to get a nice compile error for non-forward iterators.
Another option is to use random-access iterators and leave it for a future patch to handle forward iterators.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:176
+ /// take `Size` into account.
+ void updateBytes(const uint8_t *Ptr, size_t Size) {
+ Hasher.update(Ptr, Size);
----------------
I suggest taking / using an `ArrayRef<uint8_t>` here. It'd be a nice convenience to add an overload for `updateBytes(StringRef)` as well -- something I've wanted a few times when using the Hasher interfaces with `MemoryBuffer::getBuffer()` -- which can just cast over to ArrayRef<uint8_t>`.
================
Comment at: llvm/include/llvm/Support/HashBuilder.h:198
+ void updateRangeImpl(InputIteratorT First, InputIteratorT Last) {
+ update(Last - First);
+ for (auto It = First; It != Last; ++It)
----------------
I think this will have to be a `std::distance` call if you handle arbitrary forward iterators (not sure that's necessary). The template parameters could use an update in either case.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:51
+TEST(HashBuilderTest, BasicTestMD5) {
+ char c = 'c';
+ int i = 1;
----------------
Lots of "invalid case style" warnings coming from clang-tidy in this file. Please follow the style conventions at https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly (e.g., use `C` for this variable instead of `c`), if nothing else to make the patch easier to read.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:75
+
+TEST(HashBuilderTest, BasicTestSHA1) {
+ char c = 'c';
----------------
Rather than duplicating the tests, I suggest using http://google.github.io/googletest/advanced.html#typed-tests to iterate through MD5, SHA1, and SHA256.
================
Comment at: llvm/unittests/Support/HashBuilderTest.cpp:225-228
+TEST(HashBuilderTest, HashStdPairTuple) {
+ EXPECT_EQ(computeMD5Words(std::make_pair(1, "string")),
+ computeMD5Words(std::make_tuple(1, "string")));
+}
----------------
I suggest comparing these both (or at least one of them) against:
```
lang=c++
update(1);
update("string");
```
so that you're confirming the hash is updated correctly, not just the same way as each other (unless you tested that elsewhere and I missed it?)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106910/new/
https://reviews.llvm.org/D106910
More information about the llvm-commits
mailing list