[Openmp-commits] [PATCH] D109164: [OpenMP] Manually unroll the argument copy loop

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Sep 2 09:16:29 PDT 2021


jdoerfert created this revision.
jdoerfert added reviewers: jhuber6, ggeorgakoudis.
Herald added subscribers: guansong, bollu, yaxunl.
jdoerfert requested review of this revision.
Herald added a subscriber: sstefan1.
Herald added a project: OpenMP.

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D109164

Files:
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu


Index: openmp/libomptarget/deviceRTLs/common/src/parallel.cu
===================================================================
--- openmp/libomptarget/deviceRTLs/common/src/parallel.cu
+++ openmp/libomptarget/deviceRTLs/common/src/parallel.cu
@@ -313,10 +313,62 @@
   if (nargs) {
     void **GlobalArgs;
     __kmpc_begin_sharing_variables(&GlobalArgs, nargs);
-    // TODO: faster memcpy?
-#pragma unroll
-    for (int I = 0; I < nargs; I++)
-      GlobalArgs[I] = args[I];
+    switch (nargs) {
+      default:
+      for (int I = 0; I < nargs; I++)
+        GlobalArgs[I] = args[I];
+      break;
+      case 16:
+        GlobalArgs[15] = args[15];
+        // FALLTHROUGH
+      case 15:
+        GlobalArgs[14] = args[14];
+        // FALLTHROUGH
+      case 14:
+        GlobalArgs[13] = args[13];
+        // FALLTHROUGH
+      case 13:
+        GlobalArgs[12] = args[12];
+        // FALLTHROUGH
+      case 12:
+        GlobalArgs[11] = args[11];
+        // FALLTHROUGH
+      case 11:
+        GlobalArgs[10] = args[10];
+        // FALLTHROUGH
+      case 10:
+        GlobalArgs[9] = args[9];
+        // FALLTHROUGH
+      case 9:
+        GlobalArgs[8] = args[8];
+        // FALLTHROUGH
+      case 8:
+        GlobalArgs[7] = args[7];
+        // FALLTHROUGH
+      case 7:
+        GlobalArgs[6] = args[6];
+        // FALLTHROUGH
+      case 6:
+        GlobalArgs[5] = args[5];
+        // FALLTHROUGH
+      case 5:
+        GlobalArgs[4] = args[4];
+        // FALLTHROUGH
+      case 4:
+        GlobalArgs[3] = args[3];
+        // FALLTHROUGH
+      case 3:
+        GlobalArgs[2] = args[2];
+        // FALLTHROUGH
+      case 2:
+        GlobalArgs[1] = args[1];
+        // FALLTHROUGH
+      case 1:
+        GlobalArgs[0] = args[0];
+        // FALLTHROUGH
+      case 0:
+        break;
+    }
   }
 
   // TODO: what if that's a parallel region with a single thread? this is


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D109164.370290.patch
Type: text/x-patch
Size: 1910 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20210902/88a9af5f/attachment.bin>


More information about the Openmp-commits mailing list