[libcxx-commits] [PATCH] D151717: [libc++][PSTL] Add a GCD backend

Wed Jul 5 09:53:50 PDT 2023

ldionne added inline comments.

================
Comment at: libcxx/include/__algorithm/pstl_backends/cpu_backends/libdispatch.h:112-116
+  if (__size1 + __size2 <= __default_chunk_size) {
+    __ranges.emplace_back(
+        std::move(__first1), std::move(__last1), std::move(__first2), std::move(__last2), std::move(__result));
+    return;
+  }
----------------
We should ensure that this `emplace_back()` doesn't need to reallocate, otherwise it gets messy with exceptions (and it's a bit inefficient). We should add an assertion here `_LIBCPP_ASSERT_UNCATEGORIZED(__ranges.capacity() >= __ranges.size() + 1);`.

================
Comment at: libcxx/include/__algorithm/pstl_backends/cpu_backends/libdispatch.h:145-146
+        _Comp __comp) {
+  pmr::vector<__merge_range<_RandomAccessIterator1, _RandomAccessIterator2, _RandomAccessIteratorOut>> __ranges(
+      __libdispatch::__get_memory_resource());
+
----------------
Here let's `reserve()` in advance. The upper bound on the number of ranges we'll ever allocate should be the number of leaves in the "call tree of `__calculate_merge_ranges`". In the worst case we only divide the biggest range by two and we don't touch the other range at every step, so the number of levels in that call tree is bounded by the number of times you can divide each range by two. You want to sum the worst case for each range here. The number of leaves will be bounded by 2^levels. Since `levels` grows logarithmically, this shouldn't be as bad as it seems, basically this would end up being linear. So I think this gives us:

```
2^ (log_2(size1/default_chunk_size)) == size1/default_chunk_size
```

If we add both of the ranges, it means we'd be bounded by:

```
size1/default_chunk_size + size2/default_chunk_size == (size1+size2)/default_chunk_size
```

But actually, we might be able to ditch this altogether and instead base our chunking on `__partition_chunks` by chunking the largest range (or a combination of both) iteratively. I'm not seeing the solution clearly right now but I think we might be able to come up with something better than what we have.

Potential alternative
----------------------------
We could instead walk the largest range by increments of `chunk_size` and compute the matching location to merge from in the small range using `lower_bound` or `upper_bound`, iteratively. We would pre-reserve `large-range / chunk_size` in the vector. That seems simpler and while there are adversarial inputs with that method, the general case is better behaved.

================
Comment at: libcxx/test/libcxx/algorithms/pstl.libdispatch.chunk_partitions.pass.cpp:16
+#include <__algorithm/pstl_backends/cpu_backends/libdispatch.h>
+#include <cassert>
+
----------------

================
Comment at: libcxx/test/libcxx/algorithms/pstl.libdispatch.chunk_partitions.pass.cpp:19
+int main(int, char**) {
+  for (ptrdiff_t i = 0; i != 2ll << 20; ++i) {
+    auto chunks = std::__par_backend::__libdispatch::__partition_chunks(i);
----------------

================
Comment at: libcxx/test/std/algorithms/alg.modifying.operations/alg.copy/fuzz.pstl.copy.sh.cpp:42-46
+extern "C" int LLVMFuzzerTestOneInput(const NonTrivial* data, std::size_t size) {
+  std::vector<NonTrivial> vec(size);
+  std::copy(std::execution::par, data, data + size, vec.begin());
+  return !std::equal(data, data + size, vec.data());
+}
----------------

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/fuzz.pstl.merge.sh.cpp:15
+// RUN: %{build} -fsanitize=fuzzer -O3
+// RUN: %{run} -max_total_time=10 -max_len=1073741824
+
----------------
What is that number? Can you document it with a comment and the rationale for using it? (e.g. "we want this to contain at least N + M elements so we can merge large-ish ranges")

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/pstl.merge.pass.cpp:99
       std::vector<int> out(std::size(a) + std::size(b));
-      std::merge(
-          Iter1(a.data()), Iter1(a.data() + a.size()), Iter2(b.data()), Iter2(b.data() + b.size()), std::begin(out));
+      std::merge(policy,
+                 Iter1(a.data()),
----------------
Let's do this as a separate patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151717/new/

https://reviews.llvm.org/D151717