[Openmp-commits] [PATCH] D113824: [OpenMP] Fix custom state machine if have reduction

Joel E. Denny via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Nov 12 21:43:48 PST 2021

jdenny created this revision.
jdenny added reviewers: jdoerfert, JonChesterfield, tianshilei1992, grokos, jhuber6, RaviNarayanaswamy.
jdenny added a project: OpenMP.
Herald added subscribers: ormris, guansong, hiraditya, yaxunl.
jdenny requested review of this revision.
Herald added subscribers: llvm-commits, sstefan1.
Herald added a project: LLVM.

D113602 <https://reviews.llvm.org/D113602> broke the custom state machine when a reduction is present.
For example, see the test case in this patch.  Somehow in that case,
openmp-opts decides to change the return value to undef in
`__kmpc_get_warp_size` (which the custom state machine calls as of
D113602 <https://reviews.llvm.org/D113602>).  Later optimizations then optimize away the custom state
machine code as if all threads are outside the thread block, so the 
target region does not execute.

Other runtime functions do not seem to have this problem, so I looked
for differences.  I found that adding `registerFoldRuntimeCall` and 
`foldKernelFnAttribute` calls for `__kmpc_get_warp_size` to OpenMPOpt
fixed the problem, so that's what this patch does.  I do not yet 
understand much of OpenMPOpt, and I am not confident in this solution,
so please advise.

This patch also adds a `__OMP_RTL_ATTRS` entry for 
`__kmpc_get_warp_size` to OMPKinds.def, which D113602 <https://reviews.llvm.org/D113602> missed.  This
change does not seem to have any impact on the reduction problem.

  rG LLVM Github Monorepo



Index: openmp/libomptarget/test/offloading/bug51781.c
--- openmp/libomptarget/test/offloading/bug51781.c
+++ openmp/libomptarget/test/offloading/bug51781.c
@@ -21,14 +21,29 @@
 // RUN: %fcheck-amdgcn-amd-amdhsa -check-prefix=CUSTOM -input-file=%t.custom
 // RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic
+// Repeat with reduction clause, which has managed to break the custom state
+// machine in the past.
+// RUN: %libomptarget-compile-generic -O1 -Rpass=openmp-opt -DADD_REDUCTION \
+// RUN:   -mllvm -openmp-opt-disable-spmdization > %t.custom 2>&1
+// RUN: %fcheck-nvptx64-nvidia-cuda -check-prefix=CUSTOM -input-file=%t.custom
+// RUN: %fcheck-amdgcn-amd-amdhsa -check-prefix=CUSTOM -input-file=%t.custom
+// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic
 // CUSTOM: Rewriting generic-mode kernel with a customized state machine.
+# define REDUCTION(...) reduction(__VA_ARGS__)
+# define REDUCTION(...)
 #include <stdio.h>
 int main() {
   int x = 0, y = 1;
-  #pragma omp target teams num_teams(1) map(tofrom:x, y)
+  #pragma omp target teams num_teams(1) map(tofrom:x, y) REDUCTION(+:x)
-    x = 5;
+    x += 5;
     #pragma omp parallel
     y = 6;
Index: llvm/lib/Transforms/IPO/OpenMPOpt.cpp
--- llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+++ llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -4160,6 +4160,9 @@
     case OMPRTL___kmpc_get_hardware_num_blocks:
       Changed = Changed | foldKernelFnAttribute(A, "omp_target_num_teams");
+    case OMPRTL___kmpc_get_warp_size:
+      Changed = Changed | foldKernelFnAttribute(A, "");
+      break;
       llvm_unreachable("Unhandled OpenMP runtime function!");
@@ -4427,6 +4430,7 @@
+    registerFoldRuntimeCall(OMPRTL___kmpc_get_warp_size);
   // Create CallSite AA for all Getters.
Index: llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
--- llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -207,6 +207,7 @@
 __OMP_RTL(__kmpc_get_hardware_num_blocks, false, Int32, )
 __OMP_RTL(__kmpc_get_hardware_num_threads_in_block, false, Int32, )
+__OMP_RTL(__kmpc_get_warp_size, false, Int32, )
 __OMP_RTL(omp_get_thread_num, false, Int32, )
 __OMP_RTL(omp_get_num_threads, false, Int32, )
@@ -455,8 +456,6 @@
 __OMP_RTL(__kmpc_warp_active_thread_mask, false, Int64,)
 __OMP_RTL(__kmpc_syncwarp, false, Void, Int64)
-__OMP_RTL(__kmpc_get_warp_size, false, Int32, )
 __OMP_RTL(__kmpc_is_generic_main_thread_id, false, Int8, Int32)
 __OMP_RTL(__last, false, Void, )
@@ -629,6 +628,7 @@
 __OMP_RTL_ATTRS(__kmpc_get_hardware_num_blocks, GetterAttrs, AttributeSet(), ParamAttrs())
 __OMP_RTL_ATTRS(__kmpc_get_hardware_num_threads_in_block, GetterAttrs, AttributeSet(), ParamAttrs())
+__OMP_RTL_ATTRS(__kmpc_get_warp_size, GetterAttrs, AttributeSet(), ParamAttrs())
 __OMP_RTL_ATTRS(omp_get_thread_num, GetterAttrs, AttributeSet(), ParamAttrs())
 __OMP_RTL_ATTRS(omp_get_num_threads, GetterAttrs, AttributeSet(), ParamAttrs())

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D113824.387005.patch
Type: text/x-patch
Size: 3454 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20211113/4ba3e7b4/attachment.bin>

More information about the Openmp-commits mailing list