[libcxx-commits] [PATCH] D99849: [pstl] Fix a number of bugs with parallel partial sort

Sat Apr 3 16:49:26 PDT 2021

nadiasvertex added inline comments.

================
Comment at: pstl/include/pstl/internal/parallel_backend_omp.h:569

-template <typename _RandomAccessIterator, typename _Compare>
-struct _MinKOp
-{
-    _MinKItems<_RandomAccessIterator>& __items;
+template <typename _RandomAccessIterator, typename _Compare> struct _MinKOp {
+    _MinKItems<_RandomAccessIterator> &__items;
----------------
Quuxplusone wrote:
> New lines 577, 570, 573: seems like you should undo these whitespace diffs (manually if necessary).
> 
> I don't suppose it would make sense to use a heap instead of linearly scanning the vector in `__update_largest_item` every time one is inserted? I mean, that's the whole point of non-parallel `partial_sort`... but I guess the idea here is that rearranging the elements once would be less cache-friendly than linearly scanning them multiple times?
Yes, exactly. I didn't do timed tests, but when I look at managing a heap, operationally it's hard to see how it could be faster than the linear scan, when K is small. If K starts to get large, one wonders why a partial sort is even being done. If I maintain a heap there's a lot of copying and moving going on. This approach lets me overwrite existing items in place and enjoy the benefits of the cache for the search.

Also, during the merge operation used to reduce the chunks I have to update the largest item, so I would be doing a lot of heap maintenance. But if you think it would be better, I can look into doing that instead.

================
Comment at: pstl/include/pstl/internal/parallel_backend_omp.h:600-601
+            [this](const auto &l, const auto &r) { return __comp(*l, *r); });
         __items.__largest_item =
-            std::max_element(std::begin(__items.__smallest_k_items), std::end(__items.__smallest_k_items),
-                             [this](const auto& l, const auto& r) { return __comp(*l, *r); });
+            std::distance(std::begin(__items.__smallest_k_items), pos);
     }
----------------
Quuxplusone wrote:
> if you wanted to keep this simple
In general I thought it better to use std:distance, but if the preference is to use explicit + and - operators, I can do that.

================
Comment at: pstl/include/pstl/internal/parallel_backend_omp.h:859
     {
-        __parallel_stable_sort_body(__xs, __pivot, __comp);
+        __parallel_stable_sort_body(__xs, __part_end, __comp);
     }
----------------
Quuxplusone wrote:
> Technically, all of these ADL calls are unsafe and should be qualified, like `__omp_backend::__parallel_stable_sort_body(...)`. You're changing so much in the PR currently that I feel like you should go ahead and ADL-proof it too; but I wouldn't make ADL the //main// point of the PR. ;)
Actually, this is all "new" code. I have another parent revision where most of this code is introduced. However, the parallel partial sort is buggy, so I am working on fixing it here. I will make the ADL fixes you suggest.

The partial sort now succeeds for most of the collections I try, but it fails to produce correct results in some cases, I think due to some asymmetry in the reduction. I thought I had fixed it (which is why I posted this) until I did more testing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99849/new/

https://reviews.llvm.org/D99849