[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Wed Jan 24 12:01:13 PST 2018

Hahnfeld added a comment.

I think this finally looks reasonably good to land: I'm going to accept this once the last few minor comments are addressed. So if anyone has concerns, please raise them now!

@grokos Can you please upload a new version with changes for my inline comments? This will make it easier to spot errors in the last changes. Afterwards, I think the code should be formatted with `clang-format`. Unfortunately, it doesn't pick up `.cu` files by default but I think we might be able to tweak this with a configuration file?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:96
+    # We use the one provided by the user, attempt to use the one used to build
+    # libomptarget, attempt to use clang in the PATH, or just fail.
+
----------------
`clang in the PATH` isn't true anymore.

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:108-109
+    else()
+      libomptarget_say("Cannot find a CUDA compiler capable of emitting LLVM bitcode.")
+      libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_CUDA_COMPILER")
+    endif()
----------------
We cannot satisfy the user's request, so should probably be `libomptarget_error_say`?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:121-122
+    else()
+      libomptarget_say("Cannot find a linker capable of linking LLVM bitcode objects.")
+      libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_BC_LINKER")
+    endif()
----------------
`libomptarget_error_say` as well

================
Comment at: libomptarget/deviceRTLs/nvptx/src/interface.h:360-361
+// Support for reducing conditional lastprivate variables
+EXTERN void __kmpc_reduce_conditional_lastprivate(kmp_Indent *loc,
+  int32_t global_tid, int32_t varNum, void *array);
+
----------------
grokos wrote:
> Hahnfeld wrote:
> > Will the host runtime use the same entry point? Because it's not yet present in `libomp`
> Yes, I will add the same function to the host runtime. It's not there yet, but it's on my list.
Then you need to agree on the interface with Intel, just saying...

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu:45-50
+INLINE unsigned n_sm() {
+  unsigned n_sm;
+  asm("mov.u32 %0, %%nsmid;" : "=r"(n_sm));
+  return n_sm;
+}
+
----------------
Not used?

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:72
+  // warp.
+  DS_Max_Worker_Warp_Size_Bits = 5,
+  // The slot size that should be reserved for a working warp.
----------------
I think all other usages of `>> DS_Max_Worker_Warp_Size_Bits` can be replaced with `% WARPSIZE` or am I missing something in here?

================
Comment at: libomptarget/deviceRTLs/nvptx/src/option.h:20-43
+////////////////////////////////////////////////////////////////////////////////
+// following two defs must match absolute limit hardwired in the host RTL
+#define TEAMS_ABSOLUTE_LIMIT                                                   \
+  512 /* omptx limit (must match teamsAbsoluteLimit) */
+#define THREAD_ABSOLUTE_LIMIT                                                  \
+  1024 /* omptx limit (must match threadAbsoluteLimit) */
+
----------------
grokos wrote:
> Hahnfeld wrote:
> > Only `MAX_THREADS_PER_TEAM` and `WARPSIZE` are used, the rest can go away.
> `THREAD_ABSOLUTE_LIMIT` is also used (in the definition of `MAX_THREADS_PER_TEAM`).
Yeah, but in the same file and only once. So you can just do `#define MAX_THREADS_PER_TEAM 1024` ;-)

================
Comment at: libomptarget/deviceRTLs/nvptx/src/reduction.cu:238
+}
+#endif // __CUDA_ARCH__ >= 700
+
----------------
I think this `endif` needs to go before the last closing `}` to also finish the function in the `if` case.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/reduction.cu:386-388
+  return ThreadId == 0;
+}
+#endif // __CUDA_ARCH__ >= 700
----------------
Here too, I think `endif` needs to go before the `return ThreadId == 0`.

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D14254