[Openmp-commits] [PATCH] D60918: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Apr 22 13:11:21 PDT 2019

ABataev marked 4 inline comments as done.
ABataev added inline comments.

Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:18
                    ? omp_is_initial_device()
                    : 1;
       ParallelLevel1 =
jdoerfert wrote:
> I'm confused by the ternary operator (also the one below).
> If the first target thread executes this critical region, `isHost` is `-1` and it will be set to `omp_is_initial_device()`. The second thread comes along and will set it to `omp_is_initial_device` because it was, at any point in time, either `-1` or `omp_is_initial_device`. And so on. So why the ternary operator?
Because it is a test. If `omp_is_initial_device()` at least once return unexpected value, `isHost` will be set to 1 instead of the expected 0.

Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:27
+        L2 = omp_get_level();
+        Count += omp_get_level(); // (9-5)*10*2 = 80
+      }
jdoerfert wrote:
> I don't get the calculation in the comment. Could you please elaborate?
> In general, I'm confused because you rewrite the test again.
> Was the old test broken? If not, why don't we keep both? If it was, can you explain why?
> I have more and more the feeling changes are added and modified faster than anyone actually understands them, leading to "tests" that simply verify the current behavior.
The first version of the test did not test the required functionality - SPMD mode without runtime. I fixed the test and committed to test the construct that supports SPMD+no runtime. This version of the test goes further and adds the testing of the parallel level with thread divergence.
The calculations are simpl—É (see the comment below). We have a loop, that will be executed 4 times (since the outer loop iterates from 0 to 10, scheduling is static, 1 and we have condition `omp_get_thread_num() > 5`. It means that only 4 threads will execute inner `parallel for` construct (10 - number of iterations in outer loop, 6 - the first thread that will execute the loop, need to fix the expression from 9-5 to 10-6 in the comment).
Each inner loop iterates 10 time and for secnd level of parallelism `omp_get_level()` must return 2. Thus, the expected result is 4*10*2 = 80.

Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:30
 #pragma omp critical
-    ParallelLevel2 = (ParallelLevel2 < 0 || ParallelLevel2 == 2) ? L2 : 1;
+      ParallelLevel2 = (ParallelLevel2 < 0 || ParallelLevel2 == 2) ? L2 : 1;
+    } else {
jdoerfert wrote:
> Why do we need this ternary operator, shouldn't ParallelLevel2 always be set to L2?
Same, it is test, it checks that `ParallelLevel2` inall required threads set correctly to 2, not 3, 1, 0 or something else.

Comment at: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp:38
     printf("Runtime error, isHost=%d\n", isHost);
jdoerfert wrote:
> As far as I can tell, a value of -1 for isHost will not cause the test to fail. `(bool)-1` is `true` and you never verify the "Runtime error" is not printed.
It will, since expected result is `Target region executed on the device`. With `isHost` equal to `-1` the result will be `Target region executed on the host`

  rOMP OpenMP



More information about the Openmp-commits mailing list