[libcxx-commits] [PATCH] D99836: [pstl] Implement OpenMP backend

Mikhail Dvorskiy via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Sep 17 09:14:08 PDT 2021


MikeDvorskiy requested changes to this revision.
MikeDvorskiy added inline comments.
This revision now requires changes to proceed.


================
Comment at: pstl/include/pstl/internal/omp/parallel_for_each.h:28
+    using DifferenceType = typename ::std::iterator_traits<_ForwardIterator>::difference_type;
+    auto __size = ::std::distance(__first, __last);
+
----------------
On this line we call increment for an iterator value (last-first) time in one thread, because it is a forward iterator..

Probably, there is an approach to avoid such overhead.. At least, I would add //TODO: like "to think how to avoid the overhead"



================
Comment at: pstl/include/pstl/internal/omp/parallel_for_each.h:33
+    {
+        auto __iter = ::std::next(__first, __index);
+        __f(*__iter);
----------------
Probably, here we have overhead, because an each loop iteration increments __first from "zero".
Yes, some it is done in parallel but...  In any case make sense to think whether we avoid it, if it possible. At least, I would add //TODO: like "to think how to avoid the overhead"


================
Comment at: pstl/include/pstl/internal/omp/parallel_invoke.h:39
+    if (omp_in_parallel())
+    {
+        __parallel_invoke_body(std::forward<_F1>(__f1), std::forward<_F2>(__f2));
----------------
Nit: redundant braces.


================
Comment at: pstl/include/pstl/internal/omp/parallel_merge.h:42
+    {
+        __ym = __ys + (__size_y / 2);
+        __xm = ::std::upper_bound(__xs, __xe, *__ym, __comp);
----------------
We have redundant parentheses, according to the C++ operation priority list.


================
Comment at: pstl/include/pstl/internal/omp/parallel_merge.h:47
+    {
+        __xm = __xs + (__size_x / 2);
+        __ym = ::std::lower_bound(__ys, __ye, *__xm, __comp);
----------------
We have redundant parentheses, according to the C++ operation priority list.


================
Comment at: pstl/include/pstl/internal/omp/parallel_merge.h:74
+{
+    std::size_t __size_x = std::distance(__xs, __xe);
+    std::size_t __size_y = std::distance(__ys, __ye);
----------------
The "parallel merge" pattern accepts  just random access iterator types. So, we can write here just difference operator, everywhere in the patch.

Probably, it makes sense to add an assert (in the begin of the function) on random "access iterator type".


================
Comment at: pstl/include/pstl/internal/omp/parallel_merge.h:90
+    if (omp_in_parallel())
+    {
+        __parallel_merge_body(__size_x, __size_y, __xs, __xe, __ys, __ye, __zs, __comp, __leaf_merge);
----------------
Nit: redundant braces. (here and everywhere in the same places)


================
Comment at: pstl/include/pstl/internal/omp/parallel_reduce.h:28
+{
+    if (__should_run_serial(__first, __last))
+    {
----------------
Talking about fallback to serial call (here and everywhere in the patch) - in the TBB backend (and others) we have not such logic (excepting sort and merge algorithms). A point here - a user functor/lambda  may be so complicated and, probably, parallel execution is reasonable even for "short" sequences... PSTL relies that is user's responsibility - to run execution in parallel or not. Probably, in the future, if the users will complain, we will change approach, but in any case the behavior should be consistent in the all backends.
 


================
Comment at: pstl/include/pstl/internal/omp/parallel_reduce.h:33
+
+    auto __middle = __first + ((__last - __first) / 2);
+    _Value __v1(__identity), __v2(__identity);
----------------
We have redundant parentheses (here and everywhere in the similar places), according to the C++ operation priority list.


================
Comment at: pstl/include/pstl/internal/omp/parallel_reduce.h:53
+{
+    if (__first == __last)
+        return __identity;
----------------
According to PSTL design, the trivial pre-checks  are processed on the "middle" PSTL layer - in algorithm_iml.h, numeric_impl.h, etc. 

If in some places on the "middle layer" it is not true - it should be added. And remove such trivial pre-checks on a threading backend level, m.b. just add an assert instead of. 



================
Comment at: pstl/include/pstl/internal/omp/parallel_scan.h:54
+void
+__downsweep(_Index __i, _Index __m, _Index __tilesize, _Tp* __r, _Index __lastsize, _Tp __initial, _Cp __combine,
+            _Sp __scan)
----------------
Regarding the following functions: __downsweep, __upsweep, __split. 
The implementation if ones can be shared between TBB and OpenMP backends... 
For now I think is OK. It may be done with the next patch.


================
Comment at: pstl/include/pstl/internal/omp/parallel_stable_sort.h:121
+{
+    if (__xs >= __xe)
+    {
----------------
Please, see my comment above (about trivial  pre-checks).


================
Comment at: pstl/include/pstl/internal/omp/parallel_transform_reduce.h:50
+
+    // Here, we cannot use OpenMP UDR because we must store the init value in
+    // the combiner and it will be used several times. Although there should be
----------------
Actually, we cannot use OpenMP UDR because it requires an identical value to initialize each accumulator. "__init"  parameter is initial value of reduction, not an "identity".
So, the comment should be corrected.



================
Comment at: pstl/include/pstl/internal/omp/parallel_transform_reduce.h:91
+
+    if (__first == __last)
+    {
----------------
Please, see my comment above (about trivial  pre-checks).


================
Comment at: pstl/include/pstl/internal/omp/parallel_transform_scan.h:27
+                          _Sp __scan)
+{
+    return __scan(_Index(0), __n, __init);
----------------
I believe,  "//TODO"  should added here- about that parallel implementation will be added later, while we have fallback to a serial call.


================
Comment at: pstl/include/pstl/internal/omp/util.h:60
+template <typename _Tp>
+class __buffer
+{
----------------
I believe the call class __buffer can be shared between the all host backend.
But, I think, for now is OK. It may be done later, in the one of next  patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99836/new/

https://reviews.llvm.org/D99836



More information about the libcxx-commits mailing list