[libcxx-commits] [PATCH] D59791: [pstl] Add a serial backend for the PSTL

Wed Apr 10 13:57:41 PDT 2019

ldionne added a subscriber: wash.
ldionne marked 3 inline comments as done.
ldionne added a comment.

I'd love to get concrete feedback on whether the implementation of the algorithms is wrong, and if so, why. This would be really helpful in understanding the backend API better and would allow me to make progress on implementing this backend, which is a pre-requisite for many other things.

Also note that I don't expect this backend implementation to be final after this commit. I just want us to get something "correct" so as to nail down the semantics of the backend functions and make progress on other tasks. I strongly suspect that we'll have to make changes to the backend API to get a more straightforward serial implementation, but now's not the time to tackle this.

================
Comment at: pstl/include/pstl/internal/parallel_backend_serial.h:86
+{
+    return __scan(_Index(0), __n, __init);
+}
----------------
ldionne wrote:
> So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial implementation would be?
This is the only thing I strongly suspect is incorrectly implemented. Can you please check this?

================
Comment at: pstl/include/pstl/internal/parallel_backend_serial.h:94
+    // TODO: What to do with __nsort?
+    __leaf_sort(__first, __last, __comp);
+}
----------------
MikeDvorskiy wrote:
> ldionne wrote:
> > Is `__nsort` the number of elements that should be sorted in the resulting range? IOW, this is a partial sort where you want to get the `n` smallest elements of the whole range in sorted order.
> > 
> > It's not clear to me how to achieve this without a `__leaf_sort` that itself accepts a `n` to only partially sort the first `n` elements. I mean, I could probably find a way to do it iteratively by sorting a bit more every time, but what I really want is to just call `std::partial_sort`. And actually, while we're at it, calling `std::partial_sort` isn't enough, since this sort needs to be stable. But we don't have a `std::stable_partial_sort`.
> Actually, __leaf_sort is not only std::sort or std::partial_sort.
> __leaf_sort (as lambda) has already captured information about "sort" or "partial sort" - "__n" variable.
> See, the code of the passed lambda into parallel_stable_sort:
> 
> 
> ```
> [__n](_RandomAccessIterator __begin, _RandomAccessIterator __end, _Compare __comp) {
>                 if (__n < __end - __begin)
>                     std::partial_sort(__begin, __begin + __n, __end, __comp);
>                 else
>                     std::sort(__begin, __end, __comp);
>             },
> ```
> 
> So, in case of serial case here you don't use __nsort parameter, due to you the special lambda which takes into account "sort" or "partial sort" logic.
> 
> In case of parallel implementation __n parameter may be useful - for  merging of  two sorted sub-ranges , just first __n elements....
Ah, I missed the bit in the lambda. So this implementation is correct, then. Thanks!

================
Comment at: pstl/include/pstl/internal/parallel_backend_serial.h:106
+{
+    __leaf_merge(__first1, __last1, __first2, __last2, __out, __comp);
+}
----------------
MikeDvorskiy wrote:
> ldionne wrote:
> > Note that the obvious implementation doesn't work:
> > 
> > ```
> > std::merge(__first1, __last1, __first2, __last2, __out, __comp);
> > ```
> > 
> > because it requires the elements to be copyable, but pstl apparently expects the merge to only move elements around without copying them.
> Why "doesnt work"?
> 
> In special case
> ...
>     if (__n <= __merge_cut_off)
>     {
>         // Fall back on serial merge
>         __leaf_merge(__xs, __xe, __ys, __ye, __zs, __comp);
>     }
> ...
> where __leaf_merge is std::merge
Like I said, `std::merge` requires elements to be copyable, but the PSTL tests call `parallel_merge` with elements that are move-only. It also looks like you guys went through some hoops to make this work, see `__serial_move_merge` in `parallel_backend_utils.h`.

However, since it looks like calling `__leaf_merge` is a valid implementation, let's revisit this copyability issue at another point. I want to land this patch ASAP because other stuff depends on it.

In case you're curious, the error message for copyability looks like this:
```
<toolchain>/usr/include/c++/v1/algorithm:4392:23: error: object of type 'LocalWrapper<float>' cannot be assigned because its copy assignment operator is implicitly deleted
            *__result = *__first2;
                      ^
<toolchain>/usr/include/c++/v1/algorithm:4416:19: note: in instantiation of function template specialization 'std::__1::__merge<std::__1::less<LocalWrapper<float> > &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *>' requested here
    return _VSTD::__merge<_Comp_ref>(__first1, __last1, __first2, __last2, __result, __comp);
                  ^
<pstl-root>/include/pstl/internal/parallel_backend_serial.h:117:10: note: in instantiation of function template specialization 'std::__1::merge<std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> > >' requested here
    std::merge(__first1, __last1, __first2, __last2, __out, __comp);
         ^
<pstl-root>/include/pstl/internal/algorithm_impl.h:2667:24: note: in instantiation of function template specialization '__pstl::__serial::__parallel_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> >, (lambda at <pstl-root>/include/pstl/internal/algorithm_impl.h:2669:13)>' requested here
        __par_backend::__parallel_merge(
                       ^
<pstl-root>/include/pstl/internal/glue_algorithm_impl.h:899:17: note: in instantiation of function template specialization '__pstl::__internal::__pattern_inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> >, std::__1::integral_constant<bool, false> >' requested here
    __internal::__pattern_inplace_merge(
                ^
<pstl-root>/test/std/algorithms/alg.merge/inplace_merge.pass.cpp:63:14: note: in instantiation of function template specialization 'std::inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> > >' requested here
        std::inplace_merge(exec, first2, mid2, last2, comp);
             ^
<pstl-root>/test/support/utils.h:757:9: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        op(std::forward<Rest>(rest)...);
        ^
...
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59791/new/

https://reviews.llvm.org/D59791