[Openmp-commits] [PATCH] D60918: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Apr 22 13:50:34 PDT 2019


jdoerfert added inline comments.


================
Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:18
                    ? omp_is_initial_device()
                    : 1;
       ParallelLevel1 =
----------------
ABataev wrote:
> jdoerfert wrote:
> > I'm confused by the ternary operator (also the one below).
> > 
> > If the first target thread executes this critical region, `isHost` is `-1` and it will be set to `omp_is_initial_device()`. The second thread comes along and will set it to `omp_is_initial_device` because it was, at any point in time, either `-1` or `omp_is_initial_device`. And so on. So why the ternary operator?
> Because it is a test. If `omp_is_initial_device()` at least once return unexpected value, `isHost` will be set to 1 instead of the expected 0.
I don't think it makes sense to complicate this test case by checking for the behavior of `omp_is_initial_device`.

Also, your method of determining "at least once return unexpected value" for `omp_is_initial_device` is flawed. I sketched a path below but I don't think it is the only way to get the "correct" result even though the `omp_is_initial_device` implementation is, for some reason, broken. Finally, Setting `isHost` to 1 is not distinguishable from running on the initial device. So in a test case for `omp_is_initial_device` we should probably be able to distinguish these cases.


Let's say we are on the device, so `omp_is_initial_device` should be false (=0), and we run with 4 threads. Now, an alternating `omp_is_initial_device` would still result in `isHost = 0`:

| thread # | `isHost` | `omp_is_initial_device()` (see _below_)
| 0        |    -1    |          0
| 1        |     0    |          1
| 2        |     1    |          0
| 3        |     1    |         1/0

_below_: If there are two values the first is the result of the call in the condition, the second in the consequence.


================
Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:27
+        L2 = omp_get_level();
+        Count += omp_get_level(); // (9-5)*10*2 = 80
+      }
----------------
ABataev wrote:
> jdoerfert wrote:
> > I don't get the calculation in the comment. Could you please elaborate?
> > 
> > In general, I'm confused because you rewrite the test again.
> > Was the old test broken? If not, why don't we keep both? If it was, can you explain why?
> > 
> > I have more and more the feeling changes are added and modified faster than anyone actually understands them, leading to "tests" that simply verify the current behavior.
> The first version of the test did not test the required functionality - SPMD mode without runtime. I fixed the test and committed to test the construct that supports SPMD+no runtime. This version of the test goes further and adds the testing of the parallel level with thread divergence.
> The calculations are simplу (see the comment below). We have a loop, that will be executed 4 times (since the outer loop iterates from 0 to 10, scheduling is static, 1 and we have condition `omp_get_thread_num() > 5`. It means that only 4 threads will execute inner `parallel for` construct (10 - number of iterations in outer loop, 6 - the first thread that will execute the loop, need to fix the expression from 9-5 to 10-6 in the comment).
> Each inner loop iterates 10 time and for secnd level of parallelism `omp_get_level()` must return 2. Thus, the expected result is 4*10*2 = 80.
> need to fix the expression from 9-5 to 10-6 in the comment

That helps a lot, thanks.




================
Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:30
 #pragma omp critical
-    ParallelLevel2 = (ParallelLevel2 < 0 || ParallelLevel2 == 2) ? L2 : 1;
+      ParallelLevel2 = (ParallelLevel2 < 0 || ParallelLevel2 == 2) ? L2 : 1;
+    } else {
----------------
ABataev wrote:
> jdoerfert wrote:
> > Why do we need this ternary operator, shouldn't ParallelLevel2 always be set to L2?
> Same, it is test, it checks that `ParallelLevel2` inall required threads set correctly to 2, not 3, 1, 0 or something else.
Same as above, this doesn't test much.


Repository:
  rOMP OpenMP

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60918/new/

https://reviews.llvm.org/D60918





More information about the Openmp-commits mailing list