[libcxx-commits] [PATCH] D121074: [libc++] Implement std::boyer_moore{, _horspool}_searcher

Arthur O'Dwyer via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sun Mar 6 11:50:17 PST 2022


Quuxplusone requested changes to this revision.
Quuxplusone added a comment.
This revision now requires changes to proceed.

Thanks for working on this! This will be super useful — not really to working programmers ;) but in terms of checking boxes for C++17 conformance and finally getting to remove the experimental versions!



================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:34-39
+template <class _Key,
+          class _Value,
+          class _Hash,
+          class _BinaryPredicate,
+          bool /*useArray*/>
+class _BMSkipTable;
----------------
Personally I'd prefer to see the first specialization here done as the primary template, and then we just have one specialization for `_BMSkipTable<..., false>`. I see it's less symmetric, but it eliminates a forward declaration (and ~7 LoC).


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:56
+  _LIBCPP_HIDE_FROM_ABI
+  _BMSkipTable(size_t __sz, value_type __default_value, _Hash __hash, _BinaryPredicate __pred)
+      : __default_value_(__default_value),
----------------
Likewise throughout. "All your ctors should be explicit unless they're used for implicit conversions" (or unless the Standard API mandates it :()


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:86
+  _LIBCPP_HIDE_FROM_ABI _BMSkipTable(size_t, value_type __default_value, _Hash, _BinaryPredicate) {
+    std::fill_n(__table_.begin(), __table_.size(), __default_value);
+  }
----------------
`#include <__algorithm/fill_n.h>`
Also, please prefer `__table_.data()` here. `begin+end` and `data+size` always go together as pairs, respectively. (It's also slightly cheaper to compile `data` than `begin` because raw pointers, but that's not the reason it raised the red flag for me.)

Actually, whoa, I thought this was our `<vector>` dependency, but it's not! This specialization should just use a plain old C array `value_type __table_[256];` and cut out all the metaprogramming. This header shouldn't need to depend on `<array>` at all.
You might `static_assert(numeric_limits<value_type>::max() < 256)` just to be on the absolutely safe side.


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:122
+      __pred_(__pred),
+      __pattern_length_(std::distance(__first, __last)),
+      __skip_table_(make_shared<__skip_table_type>(__pattern_length_, -1, __hash, __pred_)),
----------------
Let's use `__last - __first` and save some instantiations. These are random-access iterators.


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:124
+      __skip_table_(make_shared<__skip_table_type>(__pattern_length_, -1, __hash, __pred_)),
+      __suffix_(make_shared<vector<difference_type>>(__pattern_length_ + 1)) {
+    difference_type __i = 0;
----------------
First reaction: Why depend on shared_ptr? Second reaction: Why two dereferences (one for the shared_ptr and another for the vector)? At the very least this should use `make_shared<difference_type[]>(__pattern_length_ + 1)`.

But it would be awesome if we could eliminate the shared_ptr too, somehow. I assume it's in there because std::search wants to copy these things but we don't want the copy to be expensive.

Could we just make std::search //not// copy these things? And then it wouldn't matter if the copy was expensive, and we could just use `std::vector<D>` instead of `std::shared_ptr<D[]>`. (The copy ctor still needs to exist, IIUC, but it doesn't have to be fast, if we can claim it's the user's fault if they ever call it.)


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:186
+  void __compute_bm_prefix(_Iterator __first, _Iterator __last, _BinaryPredicate __pred, _Container& __prefix) {
+    const size_t __count = std::distance(__first, __last);
+
----------------
As above, `__last - __first` — this is instantiated only with `RAI` and `reverse_iterator<RAI>`, both of which are random-access.


================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:192
+    for (size_t __i = 1; __i != __count; ++__i) {
+      while(__k > 0 && !__pred(__first[__k], __first[__i]))
+        __k = __prefix[__k - 1];
----------------



================
Comment at: libcxx/include/__functional/boyer_moore_searcher.h:205-206
+
+    if (__count <= 0)
+      return;
+
----------------
Nit: Should be `if (__first == __last)` or `if (__count == 0)`, I'd think. size_t can't be negative.



================
Comment at: libcxx/include/optional:165-167
+#include <__functional/hash.h>
+#include <__functional/invoke.h>
+#include <__memory/construct_at.h>
----------------
This, and the diff in the <optional> test, looks unrelated to boyer_moore_searcher. Please do this in a separate PR (which I'll rubberstamp if CI is green).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121074/new/

https://reviews.llvm.org/D121074



More information about the libcxx-commits mailing list